Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate new variable with if command

    Dear members,
    I want to generate variable X if spending < 300; variable Y if spending is between 300 and 500 and variable Z if spending is more than 500. Which commands can I use for them?
    In fact, I used traditional gen function with the if command but it does not work
    Thanks beforehand

  • #2
    You should tell us exactly what code you tried and what you mean by "it does not work."

    Comment


    • #3
      The standard approach is to create a new blank variable, then replace the missings there with desired values:
      Code:
      generate newvar=.
      replace newvar=X if spending<300
      replace newvar=Y if inrange(spending, 300,500)
      replace newvar=Z if spending>500
      X,Y, and Z can be constants, other variables, or expressions involving both.

      All four can be collapsed into a single command with a bunch of cond() functions, which reduce readability. In the wishes for Stata 14 thread a number of users including myself wished for an inline select/switch version which could simplify the notation and be used in such cases.

      Best, Sergiy Radyakin

      PS: See FAQ for how to write "it doesn't work" so that it is possible to help

      Comment


      • #4
        Using existing functionality that could be rewritten (say)

        Code:
         
        gen newvar = cond(spending < 300,              X,    ///
                     cond(inrange(spending, 300, 500), Y     ///
                     cond(spending > 500,              Z     )))
        How would this be rewritten -- supposedly in a simpler and more readable way -- using your preferred functionality?

        Comment


        • #5
          Great thanks to all of you for the responses

          In fact, I am running the following command according to your suggestions

          gen asset_group=assets
          replace asset_group=below10m if assets < 10.000000
          replace asset_group=between10m_100m if inrange (assets, 10.000000,100.000000)
          replace asset_group= above100m if assets> 100.000000
          However Stata indicates error with the following message: below10m not found
          Maybe I am making mistake in the numerical side

          Comment


          • #6
            There is two problems with your code.

            First is that you by typing

            replace asset_group=below10m if assets < 10.000000

            you effectively are asking Stata to replace the content of the variable called below10m if the condition is met; Stata correctly complains that it can not find said variable.
            If you want a certain string (i.e. combination of characters) to be entered, you need to enclose them with " ".

            The second problem is that you created the variable asset_group to be a numerical variable; now, if you are trying to replace its contents with a string instead of a number, Stata will tell you the variable type does not match. To avoid this, it would be better to create the new variable as empty string variable. Combining the two, I suggest to correct your code to

            gen asset_group = ""
            replace asset_group="below10m" if assets < 10.000000
            replace asset_group="between10m_100m" if inrange (assets, 10.000000,100.000000)
            replace asset_group= "above100m" if assets> 100.000000

            (On a side note, I have no experience with using categorial string variables; there might be good reasons to store the contents of asset_group as 1 (="below10m"), 2 or 3 instead and translate those values to the group identifiers you gave via labels. Maybe someone more experienced will come along and comment on that.)
            Last edited by David Poensgen; 14 Aug 2014, 07:20. Reason: Minor typo.

            Comment


            • #7
              The implication is that you have no variable with the name below10m

              You should check spelling very carefully.

              Alternatively, you could do things like this if it is what you really want.

              Code:
              gen asset_group = "below 10 m" if assets < 10
              replace asset_group = "between 10 and 100 m" if inrange(assets, 10, 100)
              In other words, assigning literal strings requires explicit quotation marks. If you really need categorisations, you could make them easier to handle by e.g.

              Code:
              gen ln_asset_group = floor(log10(assets))

              Comment


              • #8
                I have tried all of them. But Stata indicates : inrange not found.

                Comment


                • #9
                  I think is better to upload sample of your data, so that one can check and help you.

                  Comment


                  • #10
                    re: #8 above, since "inrange" is a widely used function, you clearly made some type of error in your command; that is why you need to show us EXACTLY what you typed (using copy-and-paste is best)

                    Comment


                    • #11
                      Apart from the other problems, I think that Nidar thinks that 10.000000 means 10 millions. It does not; it means 10; . is a decimal period.

                      Comment


                      • #12
                        David Poensgen wrote

                        First is that you by typing

                        replace asset_group=below10m if assets < 10.000000

                        you effectively are asking Stata to replace the content of the variable called below10m if the condition is met; Stata correctly complains that it can not find said variable.
                        Presumably David didn't mean what he said. The command asks Stata to replace the variable asset_group with the value of the variable below10m

                        Another typo may be proving problematic. The function inrange() should be called with no space before an argument in parentheses.

                        Comment


                        • #13
                          Click image for larger version

Name:	asset category.png
Views:	2
Size:	18.5 KB
ID:	152857
                          Click image for larger version

Name:	asset category.png
Views:	2
Size:	18.5 KB
ID:	152856
                          Dear Svend, I think you are right about my confusion
                          I have uploaded the screenshot from the dataset. You can now see how it looks. I need to group them according to size: less than 50 mln, between 50 mln and 100 mln, more than 100mln

                          Comment


                          • #14
                            You write that you want three variables, but I believe that you are better helped with a single variable with three values. Here I generate the numeric variable assetgroup, with value labels:

                            Code:
                            recode assets (min/9999999=1 "below 10m")(10000000/99999999=2 "10m to <100m") ///
                               (100000000/max=3 "100m+") , generate(assetgroup)

                            Comment


                            • #15
                              Dear Svend,

                              I need to create those three variables because I will construct table and each of these asset groups will be the row of this table

                              Comment

                              Working...
                              X