Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating new variable from combination of multiple dummy variables

    Hello! I am trying to create a new variable from multiple dummy variables in a secondary data analysis. These are race variables, where each race is represented by a separate 0/1 (no/yes) dummy variable. Due to the size of my dataset and the type of analysis, I can't look at each race as a separate category, and want to create a binary variable: white (0) and BIPoC (1). To further complicate this issue, participants can say yes to as many racial groups as they identify with, so I can't simply say "if white then 0 and if [any other race variable] then BIPoC, because people with multiple racial identities would be counted multiple times. I think I've figured out what I need, conceptually, but I don't know how to write the code. Below is my best description of the concept:

    If racevarwhite = 1 and all other racevar2-8 = 0 then newracevar = 0 (i.e. white)
    If racevarwhite = 1 and any other (combination of) racevar2-8 = 1 then newracevar = 1 (i.e. BIPoC) AND
    If racevarwhite = 0 and any other (combination of) racevar2-8 = 1 then newracevar = 1

    In other words, if white only, then 0 for newracevar and if any other combination of 1s whether inclusive of white or not, then newracevar = 1. Any assistance writing this code would be greatly appreciated! Thank you!

  • #2
    Code:
    foreach v of varlist racevar* {
        assert inlist(`v', 0, 1) | missing(`v')
    }
    egen any_non_white = rowmax(racevar2-racevar8)
    gen byte newracevar = racevarwhite == 1 & !any_non_white
    Note: No example data was provided, so this code is untested. Beware of typos or other errors. In the future, when asking for help with code, please always show example data, and use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    By the way, it seems odd that the variable names are racevarwhite and racevar2 through 8. Why isn't the indicator for white just racevar1 to fit in with the others. It doesn't really matter here, but it's peculiar and, in other situation, could make the coding more complicated.

    Comment


    • #3
      Thank you for this code! (I am new to this so thanks also for the guidance on how to post in future.) Just to be certain I understand, for the "rowmax(racevar2-racevar8)," should I insert all of the names of the race variables in the parentheses (except white)? If so, am I inserting with a space in between, comma, or other punctuation?

      Also, do I understand correctly that the way you wrote the last row of the code, 1 will equal white in my new variable?

      Finally, I'm sorry for the confusion the race variable names caused. The actual variable names are meaningless strings of letters and numbers so I used this shorthand to make my meaning clear, but I can see why it would also cause confusion. Thanks so much for your assistance!

      Comment


      • #4
        "(racevar2-racevar8)" is a shorthand way to tell Stata you want to include all the variables in the dataset that are contiguous from racevar2-racevar8, where "contiguous" means that when you look at your data set in the data editor, those variables will right next to each other, going left to right. If you looked at your variables in the variable window, those variable names would appear in a contiguous vertical list. In constructions like (racevar2-racevar8) No commas are to be included between variable names in that variable list or "varlist," as such a construction is known in Stata. Spaces between items in Stata commands are almost always permissible.

        One of the best features of Stata for new users and even experienced users is the -help- command. (After using Stata for something like 20 years, I still end up using -help- daily.) In the current case, see -help varlist-. Entering -help egen- and looking for the "rowmax" function would show that it and many other functions documented there contain the word "varlist" displayed in highlighted color, indicating that if you click on that highlighted word, Stata will bring up the help on it, just as -help varlist- would in the command window. The results of the -help- command can be daunting, so I always suggest to people to look first at the examples on a help page and then try to understand the more abstract descriptions as needed.


        Comment


        • #5
          Also, do I understand correctly that the way you wrote the last row of the code, 1 will equal white in my new variable?
          Yes, that is correct. And everybody else will have a 0 value for that variable.

          Comment


          • #6
            To expand on Clyde Schechter in #5:

            Code:
             
             racevarwhite == 1 & !any_non_white
            is TRUE if and only if

            Code:
            racevarwhite == 1
            is TRUE (that variable is equal to 1)

            AND

            Code:
            !any_non_white
            is TRUE

            which means in turn that its negation

            Code:
            any_non_white
            is FALSE (meaning zero).

            Notes:

            Here TRUE and FALSE just refer to what is factually correct or incorrect about the data. TRUE and FALSE are not creatures or legal syntax in Stata.

            As an argument, non-zero is TRUE; zero is FALSE

            but as a result TRUE is 1 and FALSE is 0.

            More at https://www.stata.com/support/faqs/d...rue-and-false/

            https://journals.sagepub.com/doi/pdf...867X1601600117

            https://journals.sagepub.com/doi/pdf...36867X19830921





            Comment


            • #7
              Success! Thanks to everyone for your advice and support!

              Comment

              Working...
              X