Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating many dummy variables with the var name and label name

    Hi all,
    Although I was going through some of the website links, I could not solve my issue. So, I am posting here and seeking the help. My issue is that I need to generate manynew dummy varaibles out of a categorical variable. For example the variable name is districtname_main which is a string type and it has 32 districts (See the attached picture). I have to gen all of them into new binary dummy variables with the var name and label name of that particular district. So I tried the following command but did not work.

    foreach var of districtname_main {
    gen dist_`var'=0
    replace dist_`var'=1 if districtname_main ==real("`var'")
    }

    I also tried many other methods suggested on various websites. But nothing worked. I wonder if someone could help me out from this issue.
    Attached Files

  • #2

    What you want is (and you should change stubname to something you want)

    Code:
    tab district_main , gen(stubname)
    What was wrong: here varname is generic for any variable name and ... are not literal but just mean some syntax not being discussed at this point. .

    1. The syntax foreach ... of varname is illegal. Here of must be followed by an acceptable keyword such as local. See help foreach and know that whatever is not allowed there is forbidden.

    2. It would not do what you want if it were legal. Even

    foreach ... in varname

    would not be a loop over the distinct values of varname which occur. It's a loop over one item, the variable name. (I think there are languages in which a loop over a variable name means a loop over its distinct values, but that's not true of Stata. If you want to loop over the distinct values of a variable, you need to refer to them directly or indirectly.)

    3. Let's assume we have a legal loop with a variable name and focus on the if qualifier.

    Code:
    if districtname_main ==real("`var'")
    This will fail as a type mismatch because districtname_main is as you say a string variable and so will never be equal to anything that comes out of a real() function.

    4. Moreover, the literal quotation marks mean that you are using the variable name, not its contents. You can try directly with


    Code:
    display real("district_main") 
    which will display missing because the string "district_main" does not have numeric content. Something like real("42") gives you a number.

    The loop is not needed at all, but these points are most of what is wrong. (Note that "The Nilgiris" wouldn't work as part of a variable name because of the space.)


    With some experience I can look at the code and see what is wrong but in general "did not work" is not a good problem report and is explicitly advised against in our FAQ. A good problem report is based on what you tried and on precisely what happened or did not happen.

    It seems that you've been using Stata since 2018 and have posted here many times. If you have not done it yet, I recommend reading the User's Manual repeatedly, stopping when it gets too difficult. It's hard to know how successful wild Googling is because when it works people don't report here, but at some point nothing beats reading the documentation systematically fo get a grounded understanding of Stata's main ideas..

    Comment


    • #3
      Agreed with Nick, Stata's base language documentation is quite good when used in conjunction with previous Statalist posts. In particular here, look up the function 'levelsof' in the [P] section; the code you want is similar to the final example (in the .sthlp file), except you will generate a variable instead of showing the value label.

      Comment


      • #4
        Hi Cox, thanks much for your detailed comments and it helped me to understand more about looping to the extent... I was also going through your another response to almost the same question on this link: https://www.statalist.org/forums/for...variable-names. I too replicated the same way how you have done in that example data set. Also, I tried it for one variable like it is given in the attached picture 1 and it actually worked well and yielded the new dummy variables. However, I tried the same way for my analysis and it has yielded me an error (r110) as in the attached picture 2. It would be great if I have a solution for this. Thanks much.

        Picture 1

        Click image for larger version

Name:	picture 1.jpg
Views:	1
Size:	49.2 KB
ID:	1715020


        Picture 2
        Click image for larger version

Name:	picture 2.jpg
Views:	1
Size:	81.5 KB
ID:	1715021


        Comment


        • #5
          Hi Eric,
          Thanks for your suggestion too. I followed levelsof function and tried replicating the same way as it is given the link mentioned in the above command. I would be greatful If you have an answer for the same.

          Comment


          • #6
            Why you are solving this problem the hard way when you have an easy solution is not clear, but perhaps you want variable names that are more informative. Even then, Stata will be pushed to show those (often very long) names in output without abbreviation in most output.

            Down to details:

            In #5 (and also in #3) levelsof is a command, not a function. This is just terminology, but learning and using the correct terminology helps everyone. For one, functions and commands obey quite different rules and, for two, they are documented differently. See e.g. https://journals.sagepub.com/doi/pdf...867X1101100308 for more on how commands and functions are disjoint in Stata -- what is true anywhere else and what is anybody's personal preference about language don't count here.

            That said, in #4 you have meaningless spaces presented together with the local macro name. A local macro name can't include spaces, and even if you want to include spaces in a local macro reference for your own reasons, say readability, that won't work as it breaks the code.

            Consider this

            Code:
            . local beast "frog"
            
            . di "` beast '"
            
            
            . di "`beast'"
            frog
            In terms of how to find your own mistake, the reference to distdum_ in the error message underlines that Stata can't use the local macro you want it to use.

            First time round the loop, that's not fatal, as new variable distdum_ is created any way. Second time around the loop, however, Stata won't use the same variable name twice, and you get thrown out of the loop.

            So that's a clue, and why can't Stata use the local macro reference?

            The usual story is at https://journals.sagepub.com/doi/ful...36867X20931028 , but doesn't apply here. (If this is behind a paywall as far as you are concerned, that will end when Stata Journal 23(2) is published in about a month's time.)

            As said, the problem is the extra spaces, and note that you have the same error twice.

            In fact, I don't recall seeing this problem before, but the law here is that Stata's rules of syntax always override your own preferences when there is a clash.

            Comment


            • #7
              Yes Nick. I will correct my errors like mispronouncing the terminologies in future post. That said, your response widened my understanding about macro and local, levelsof and foreach.... Thanks much for your time.

              Comment

              Working...
              X