Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • simple way of "greater than and not missing"


    I always have to use the expression of greater than and not missing to define a variable, like gen x = 1 if age >= 50 & !missing(age).

    Is there an easy way to simplify this expression using functions or packages?

    Thank you!






  • #2
    As my dad once said (not in this context), using a function or command in this instance would be like taking a nuclear bomb to a high school bully.

    One simplification would be
    Code:
    g x = if age >= 50 & !mi(age)

    Comment


    • #3
      Originally posted by Jared Greathouse View Post
      As my dad once said (not in this context), using a function or command in this instance would be like taking a nuclear bomb to a high school bully.

      One simplification would be
      Code:
      g x = if age >= 50 & !mi(age)
      Thank you Jared. Good idea! If there is a more easy way, it will be quick to code and reduce the time on checking.




      Comment


      • #4
        Originally posted by Ya Chen View Post
        I always have to use the expression of greater than and not missing to define a variable, like gen x = 1 if age >= 50 & !missing(age).

        Is there an easy way to simplify this expression using functions or packages?

        Thank you!
        Jared had given you great advice already. There is an inconsistency in what you say you want versus your code example. In your code, you specify greater then or equal to (>=), but you wrote only greater than (>). Be mindful of the difference.

        If you really do want to specify greater than or equal to and also not missing then these are two equivalent forms, in addition to Jared’s solution.

        Code:
        gen byte age50 = inrange(age, 50, .) if !mi(age)
        gen byte age50 = age >= 50 if !mi(age)
        There is yet (at least) one more way to code this, but it's requires more characters to type.
        Last edited by Leonardo Guizzetti; 23 Apr 2022, 22:05.

        Comment


        • #5
          Originally posted by Leonardo Guizzetti View Post

          Jared had given you great advice already. There is an inconsistency in what you say you want versus your code example. In your code, you specify greater then or equal to (>=), but you wrote only greater than (>). Be mindful of the difference.

          If you really do want to specify greater than or equal to and also not missing then these are two equivalent forms, in addition to Jared’s solution.

          Code:
          gen byte age50 = inrange(age, 50, .) if !mi(age)
          gen byte age50 = age >= 50 if !mi(age)
          There is yet (at least) one more way to code this, but it's requires more characters to type.
          Thank you Guizzetti. It is my carelessness in code, and it should be greater than (>=).

          You gave me two alternative forms when creating a binary variable. If I want to create a categorical variable with at least three levels, it will not work. Do you have any suggestion?




          Comment


          • #6
            -recode- is one option. The other way is to extend any of the above. For example, the following specifically defined each range, only updating the values of x which are the default missing, and so by definition have not yet been recorded.

            Code:
            gen x = .
            replace x = 1 if age > 18 & age < 30 & mi(x)
            replace x = 2 if age >= 30 & age < 40 & mi(x)
            ....

            Comment


            • #7
              Originally posted by Leonardo Guizzetti View Post
              -recode- is one option. The other way is to extend any of the above. For example, the following specifically defined each range, only updating the values of x which are the default missing, and so by definition have not yet been recorded.

              Code:
              gen x = .
              replace x = 1 if age > 18 & age < 30 & mi(x)
              replace x = 2 if age >= 30 & age < 40 & mi(x)
              ....
              Thank you Guizzetti. It is a good example. I would try it.

              Comment


              • #8
                Also, see

                Code:
                help irecode()

                I might go for

                Code:
                generate x = 1 if age >= 18
                replace  x = 2 if age >= 30
                ...
                replace  x = . if missing(age)

                Comment


                • #9
                  Originally posted by daniel klein View Post
                  Also, see

                  Code:
                  help irecode()

                  I might go for

                  Code:
                  generate x = 1 if age >= 18
                  replace x = 2 if age >= 30
                  ...
                  replace x = . if missing(age)
                  Thank you Klein.

                  Comment


                  • #10
                    For multiple categories this cascade is a possibility -- but not to everyone's taste.


                    Code:
                    gen wanted = cond(missing(age), ., cond(age > 65, 4, cond(age > 40, 3, cond(age > 20, 2, 1))))
                    This is best understood as

                    if age is missing return missing
                    otherwise if age > 65 return 4
                    otherwise if age > 40 return 3
                    otherwise if age > 20 return 2
                    otherwise return 1

                    Notes:

                    1. Peel off the missings first and work downwards. Then you don't get bitten by missings being larger than any other number.

                    2. Think of every cond( as a promise to put down a matching )later. Unless you're trying a branching construct that is absurdly complicated you can count instances of cond( easily enough and redeem your promises by putting down that many ) at the end.

                    3. If you prefer recode I won't try to convert you.

                    4. If you prefer egen, cut() I will try to convert you. Exactly what happens at boundary values isn't deducible from the syntax, which is a problem for beginner and experienced users alike -- to say nothing of non-Stata users who want or need to follow your code as pseudocode for them. I'd say the same for all the official *code() functions.

                    5. With cond() all your inequalities (or equalities) are explicit for the reader -- and if need be you can mix different flavours of inequality or equality.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      For multiple categories this cascade is a possibility -- but not to everyone's taste.


                      Code:
                      gen wanted = cond(missing(age), ., cond(age > 65, 4, cond(age > 40, 3, cond(age > 20, 2, 1))))
                      This is best understood as

                      if age is missing return missing
                      otherwise if age > 65 return 4
                      otherwise if age > 40 return 3
                      otherwise if age > 20 return 2
                      otherwise return 1

                      Notes:

                      1. Peel off the missings first and work downwards. Then you don't get bitten by missings being larger than any other number.

                      2. Think of every cond( as a promise to put down a matching )later. Unless you're trying a branching construct that is absurdly complicated you can count instances of cond( easily enough and redeem your promises by putting down that many ) at the end.

                      3. If you prefer recode I won't try to convert you.

                      4. If you prefer egen, cut() I will try to convert you. Exactly what happens at boundary values isn't deducible from the syntax, which is a problem for beginner and experienced users alike -- to say nothing of non-Stata users who want or need to follow your code as pseudocode for them. I'd say the same for all the official *code() functions.

                      5. With cond() all your inequalities (or equalities) are explicit for the reader -- and if need be you can mix different flavours of inequality or equality.
                      Thank you Nick. The best choice for creating multiple categories.

                      Comment

                      Working...
                      X