Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining continuous variables into one dummy variable

    Hi all, my race control variables for moms and dads in my data are highly correlated, so I am trying to combine mother and father race variables into dummy variables. Here is the syntax I used:
    gen pasian = 1 if P1_Race == 3 & P2_Race ==3
    replace pasian = 0 if P1_Race < 3 & P2_Race < 3
    replace pasian = 0 if P1_Race > 3 & P2_Race > 3


    I decided to make household dummy variables where both partners are Black, white, etc. and a dummy variable for interracial couples. There are 7 race categories: 1= African American 2= Asian American 3= Latino/a 4= White 5= Native American 6= Biracial/Mixed 7=Other.

    How many dummy variables should I have and is there better syntax than what I used?

    The command I tried using generated a new combined variable, but some of the data was dropped and I cannot figure out why or what other way to create the dummy variables.

    Thank you so much for your help!

  • #2
    Hi Jenna!

    I think the command (and prefix) "xi" can you help.
    For example, if you have two categorical variables: P1_Race and P2_Race, the command: xi i.P1_Race*i.P2_Race
    create all the dummies of the possible combinations of the two variables: (7-1) + (7-1) + (7-1)*(7-1) , because it always leaves a category as a base.

    If it is not exactly what you were looking for, in the help of the command you could find more information.

    I hope it helps you.
    Greetings,

    Comment


    • #3

      Code:
      gen pasian = P1_Race == 3 & P2_Race ==3
      does everything you ask in one line and more. (Your code does not consider that one parent might be above 3 and the other below 3.). But consider what should happen if either of the race variables is missing. For example

      Code:
      gen pasian = P1_Race == 3 & P2_Race ==3 if !missing(P1_Race, P2_Race)


      This is the subject of FAQs such as https://www.stata.com/support/faqs/d...mmy-variables/

      https://www.stata.com/support/faqs/d...rue-and-false/

      I can't follow your report that data were "dropped" as a side-effect. They would not be deleted from the dataset or ignored in modelling just because of the code you cite.

      FWIW, I would not call such variables continuous. They are discrete or categorical.

      (I disagree with the advice in #2. xi: is an old command largely superseded by later changes to Stata. There is no obvious reason to start using it now.)

      Comment


      • #4
        Thank you so much for your help!

        Comment

        Working...
        X