Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a dummy for each combination in a list

    Hello,

    I'm attempting to generate a dummy for each combination in a list. I've used the following code:

    Code:
    egen group  = group(female native white age single), label
    tab group, generate(gr_)
    However, as opposed to the 31 dummies that I would like this to generate, I am left with a dummy for each of the 5 variables, a group variable equal to 1 1 1 1 1 or (missing), and a gr_1 variable equal to 1 or (missing). Would there be a way to generate all possible combinations of these dummies?

    Thank you.

  • #2
    The code looks right. I suspect there is a problem with your data. Please post back with an example from your actual Stata data set. Use the -dataex- command to do that. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Also, why do you want to generate these indicator (dummy) variables? If it is to use them as predictors in a regression command, there is no need to create these variables yourself: factor variable notation in your regression command will accomplish this for you more simply. Read -help fvvarlist-.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Further, combinations that don't occur correspond to indicator (dummy, in your terminology) variables that are identically zero and that as such will be omitted from any regression as constants without predictive value.

      Comment


      • #4
        Thank you for the response. Perhaps I should also share how I developed these variables. I did the following:

        Code:
        gen byte fem  = 1 if sex==2
        gen byte nat   = 1 if bpl<=99
        gen byte w     = 1 if race==1 & inlist(hispan,1,2,3,4==0)
        gen byte a      = 1 if age<=25
        gen byte s      = 1 if inlist(marst,3,4,5)
        Here is an example from the data set:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(fem nat wh a s) float group byte gr_1
        . 1 1 1 . . .
        1 1 1 1 . . .
        1 1 1 1 . . .
        1 1 1 1 1 1 1
        1 1 . . . . .
        1 1 1 . . . .
        1 1 . 1 . . .
        . 1 . 1 . . .
        . 1 1 . . . .
        1 1 . 1 . . .
        end
        label values group group
        label def group 1 "1 1 1 1 1", modify
        Plese let me know if that helps. Thank you.

        Edit: I want to create these variables in order to construct some descriptive graphs for employment within these groups.
        Last edited by Greg Saldutte; 28 Jun 2018, 11:22.

        Comment


        • #5
          You missed the point in the documentation of egen that you needed to specify the missing option in your situation,

          (1, missing) indicators are a bad idea. I don't know who recommends them or why, but they are a bad idea. For example, you can't usefully average them. Or again, you can't distinguish missing because the argument was missing or missing because it's the other state.

          I would back up and go (correcting also an apparent typo inlist(hispan,1,2,3,4==0)

          Code:
          gen byte fem  = sex==2 if sex < .  
          gen byte nat   = bpl<=99 if bpl < .  
          gen byte w     = race==1 & !inlist(hispan,1,2,3,4) if !missing(race, hispan)  
          gen byte a      = age<=25 if age < .  
          gen byte s      = inlist(marst,3,4,5) if marst < .
          and then go egen with your indicators which are 1 if true, 0 if false and missing if can't tell.

          Comment


          • #6
            I want to emphasize Nick's point that 1/. indicators are a really bad idea. Using -egen-'s -missing- option will solve your immediate program. But these badly constructed variables are sure to create more problems for you down the line. So you are better off going back and re-creating these indicators as 1/0 variables and then proceeding from there.

            Comment


            • #7
              Thank you both for your responses. The problem is now solved.

              Comment

              Working...
              X