Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a categorical variable from several dummy variables

    Hello,

    I am trying to create a categorical variable that captures all of the information from several dummy variables combined. More specifically, my usual approach of using "gen" and "replace" does not work properly, because the resultant categories in the categorical variable do not equal the number of "yes" responses in the corresponding dummy variables. Put another way, I am trying to collapse responses to a "Check all that apply" question that was originally coded as dummy variables for each response, but the resultant categories seem to be unique (or mutually exclusive) counts of the dummy variables. For example, for a question, "Which statistical software package(s) do you use? Check all that apply." I could have the following responses:

    Stata (0, 1) = 34
    SPSS (0, 1) = 12
    Mplus (0, 1) = 5
    R (0, 1) = 15

    How can I create a categorical variable, collapsing the responses to these 4 individually coded questions?

    Any assistance would be greatly appreciated!

    CLB

  • #2
    It's not especially clear what you want. You allude to code that doesn't do what you want without showing it or the results to us.

    In principle there could be 16 distinct cross-combinations of those 4 variables, to say nothing of what to do about missings.

    This may help:

    SJ-7-4 dm0034 . . . Stata tip 52: Generating composite categorical variables
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q4/07 SJ 7(4):582--583 (no commands)
    tip on how to generate categorical variables using
    tostring and egen, group()

    http://www.stata-journal.com/sjpdf.h...iclenum=dm0034
    Last edited by Nick Cox; 05 Jul 2016, 13:00.

    Comment


    • #3
      Thank you, Nick. My apologies for being unclear. Typically, when creating a new variable from existing variables, I use something like the following:

      gen newvar1 = 0 if oldvar == 0
      replace newvar1 = 1 if oldvar == 1
      replace newvar1 = 2 if oldvar == 2| oldvar==3

      In my question, I have 5 dummy coded variables, and I would like to create one categorical variable that captures all 5 of the dummy variables. For my 5 dummy variables, I have the following output:

      oldvar1: 0 = 5; 1=35 (where 0 = no, and 1 = yes)
      oldvar2: 0 = 15; 1 = 25
      oldvar3: 0 = 14; 1 = 26
      oldvar4: 0 = 1; 1 = 39
      oldvar5: 0 = 20; 1 = 20

      When I use the following code:
      gen newvar1 = 0 if oldvar1 == 1
      replace newvar1 = 1 if oldvar2 == 1
      replace newvar1 = 2 if oldvar3 == 1
      replace newvar1 = 3 if oldvar4 == 1
      replace newvar1 = 4 if oldvar5 == 1

      I get something like the following result:
      newvar1: 0 = 30, 1 = 19; 2 = 20; 3 = 21; 4 = 33; 5 = 10

      Thus, my newvar1 does not capture all 1 counts of the of the dummy variables. I would like the newvar1 to take on the values of the "yes" values in the dummy variables.

      I hope this hypothetical example is a bit clearer.

      Thanks.

      Comment


      • #4
        I am sorry, but it remains unclear how you expect your newvar1 to be calculated. What value do you want it to take for an observations with 1 for oldvar2 and oldvar 4, and 0 for oldvar1, oldvar3, and oldvar5?

        Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, with just a few observations, showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

        With that said, there may be an easy answer. Do you want newvar1 to just be a count of the number of 1's in oldvar1 through oldvar5 for that observation? In that case
        Code:
        generate newvar1 = oldvar1+oldvar2+oldvar3+oldvar4+oldvar5
        would do the job.
        Last edited by William Lisowski; 05 Jul 2016, 13:36.

        Comment


        • #5
          Thanks for the detail but you need a different approach for 5 categorical variables. (It was 4 in #1 and perhaps will be 6 in #5....) Did you read the article cited?

          Here is one strategy:

          Code:
           
          egen label = concat(oldvar?) 
          egen group = group(label), label
          Then your new variable will have values 1 up and will have value labels like "00000" ... "11111".

          Comment


          • #6
            Yes, I did read the article you provided, Nick - thank you. Unfortunately, I do not want to create a composite variable that takes on all possible joint values of the combination of dummy variables. I would like to create a composite categorical variable that takes on the total number of "1" values, from multiple dummy/indicator variables. Thank you.

            Comment


            • #7
              Just add those indicator variables if you want to count the number of ones.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Hi all.
                I'm trying to make a group composed of 9 dummy variables, but I just want the dummy== 1 to appear in each category of the new group
                I tried to do this but I do not get adequate results.


                generate grupo1=0
                replace grupo1=1 if producto_1==1
                replace grupo1=2 if producto_2==1
                replace grupo1=3 if producto_3==1
                replace grupo1=4 if producto_4==1
                replace grupo1=5 if producto_5==1
                replace grupo1=6 if producto_6==1
                replace grupo1=7 if producto_7==1
                replace grupo1=8 if producto_8==1
                replace grupo1=9 if producto_9==1
                label var grupo1 "Pan, cereales y almidon"
                label define grupo1 1 "Pan" 2 "Galletas" 3 "Arroz" 4"Maiz en grano" 5"Trigo en grano"/*
                */ 6"Quinua" 7"Fideo" 8"harina de trigo y/o maíz" 9"Otros cereales" , modify
                label values grupo1 grupo1


                Thanks for your help.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double(producto_1 producto_2 producto_3 producto_4 producto_5 producto_6 producto_7 producto_8 producto_9)
                1 0 1 0 1 0 1 1 0
                1 0 1 0 1 0 1 1 0
                1 0 1 0 1 0 1 1 0
                1 0 1 0 1 0 1 1 0
                1 1 1 0 1 0 1 0 0
                1 1 1 0 1 0 1 0 0
                1 1 1 0 1 0 1 0 0
                1 1 1 0 1 0 1 0 0
                1 0 1 0 0 1 1 0 0
                1 0 1 0 0 1 1 0 0
                1 0 1 0 0 1 1 0 0
                1 0 1 1 1 0 1 1 0
                1 0 1 1 1 0 1 1 0
                1 0 1 1 1 0 1 1 0
                1 0 1 1 1 0 1 1 0
                1 0 1 1 1 0 1 1 0
                1 0 1 1 1 0 1 1 0
                1 1 1 1 1 1 1 0 1
                1 1 1 1 1 1 1 0 1
                1 1 1 1 1 1 1 0 1
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                1 0 1 1 0 0 1 1 0
                0 1 1 1 1 1 1 0 0
                0 1 1 1 1 1 1 0 0
                0 1 1 1 1 1 1 0 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 1 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 1 1 1 1 1 1
                1 0 1 1 1 1 1 1 1
                1 0 1 1 1 1 1 1 1
                1 0 1 1 1 1 1 1 1
                1 0 1 1 1 1 1 1 1
                1 0 1 0 0 1 1 0 1
                1 0 1 0 0 1 1 0 1
                1 0 1 0 0 1 1 0 1
                1 0 1 0 0 1 1 0 1
                1 1 1 1 1 1 1 1 1
                1 1 1 1 1 1 1 1 1
                1 1 1 1 1 1 1 1 1
                1 1 1 1 1 1 1 1 1
                1 1 1 1 1 1 1 1 1
                1 1 1 1 1 1 1 1 1
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 0 0 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 1 0 1 1 1 1 0
                1 0 0 1 1 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 1 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 1 1 1 0 1
                1 0 1 0 1 1 1 0 1
                1 0 1 1 0 1 1 0 1
                1 0 1 1 0 1 1 0 1
                1 0 1 1 0 1 1 0 1
                1 0 1 1 0 1 1 0 1
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                1 0 1 0 0 0 1 0 0
                end

                Comment


                • #9
                  Juan,

                  I've provided some code that hopefully will help you. In the 1st observation, the person marks yes to food products 1, 3, 5, 7, and 8. Given your "replace grupo1=8 if producto_8==1" request in post #8, that person would have grupo1==8. I'm not sure if that is what you *really* want, but you can do that with the code below.

                  Code:
                   list p_1-p_9 in 1/10, noobs
                  
                    +-----------------------------------------------------+
                    | p_1   p_2   p_3   p_4   p_5   p_6   p_7   p_8   p_9 |
                    |-----------------------------------------------------|
                    |   1     0     1     0     1     0     1     1     0 |
                    |   1     0     1     0     1     0     1     1     0 |
                    |   1     0     1     0     1     0     1     1     0 |
                    |   1     0     1     0     1     0     1     1     0 |
                    |   1     1     1     0     1     0     1     0     0 |
                    |-----------------------------------------------------|
                    |   1     1     1     0     1     0     1     0     0 |
                    |   1     1     1     0     1     0     1     0     0 |
                    |   1     1     1     0     1     0     1     0     0 |
                    |   1     0     1     0     0     1     1     0     0 |
                    |   1     0     1     0     0     1     1     0     0 |
                    +-----------------------------------------------------+
                  Code:
                  rename producto_* p_*  /* Just renaming to make list fit on one page */
                  egen count = rowtotal( p_1- p_9)  // how many food products did person mark?
                  
                  * Loop to create all-in-one list of foods
                  gen which_food = "1" if p_1 == 1
                  forvalues i = 2/9 {
                  replace which_food = which_food + ", `i'" if p_`i'==1
                  }
                  
                  replace which_food = trim(itrim( which_food))
                  
                  * This is the loop you asked for, but not sure it will give you what you really want
                  gen grupo1 = 1 if p_1==1
                  forvalues i = 2/9 {
                  replace grupo1 = `i' if p_`i'==1
                  }
                  
                   list in 1/20, abbrev(14)
                  
                       +-----------------------------------------------------------------------------------------------+
                       | p_1   p_2   p_3   p_4   p_5   p_6   p_7   p_8   p_9   count               which_food   grupo1 |
                       |-----------------------------------------------------------------------------------------------|
                    1. |   1     0     1     0     1     0     1     1     0       5            1, 3, 5, 7, 8        8 |
                    2. |   1     0     1     0     1     0     1     1     0       5            1, 3, 5, 7, 8        8 |
                    3. |   1     0     1     0     1     0     1     1     0       5            1, 3, 5, 7, 8        8 |
                    4. |   1     0     1     0     1     0     1     1     0       5            1, 3, 5, 7, 8        8 |
                    5. |   1     1     1     0     1     0     1     0     0       5            1, 2, 3, 5, 7        7 |
                       |-----------------------------------------------------------------------------------------------|
                    6. |   1     1     1     0     1     0     1     0     0       5            1, 2, 3, 5, 7        7 |
                    7. |   1     1     1     0     1     0     1     0     0       5            1, 2, 3, 5, 7        7 |
                    8. |   1     1     1     0     1     0     1     0     0       5            1, 2, 3, 5, 7        7 |
                    9. |   1     0     1     0     0     1     1     0     0       4               1, 3, 6, 7        7 |
                   10. |   1     0     1     0     0     1     1     0     0       4               1, 3, 6, 7        7 |
                       |-----------------------------------------------------------------------------------------------|
                   11. |   1     0     1     0     0     1     1     0     0       4               1, 3, 6, 7        7 |
                   12. |   1     0     1     1     1     0     1     1     0       6         1, 3, 4, 5, 7, 8        8 |
                   13. |   1     0     1     1     1     0     1     1     0       6         1, 3, 4, 5, 7, 8        8 |
                   14. |   1     0     1     1     1     0     1     1     0       6         1, 3, 4, 5, 7, 8        8 |
                   15. |   1     0     1     1     1     0     1     1     0       6         1, 3, 4, 5, 7, 8        8 |
                       |-----------------------------------------------------------------------------------------------|
                   16. |   1     0     1     1     1     0     1     1     0       6         1, 3, 4, 5, 7, 8        8 |
                   17. |   1     0     1     1     1     0     1     1     0       6         1, 3, 4, 5, 7, 8        8 |
                   18. |   1     1     1     1     1     1     1     0     1       8   1, 2, 3, 4, 5, 6, 7, 9        9 |
                   19. |   1     1     1     1     1     1     1     0     1       8   1, 2, 3, 4, 5, 6, 7, 9        9 |
                   20. |   1     1     1     1     1     1     1     0     1       8   1, 2, 3, 4, 5, 6, 7, 9        9 |
                       +-----------------------------------------------------------------------------------------------+

                  Comment


                  • #10
                    Hello David, thank you very much
                    Today I just got connected I was with a bad cold.
                    Thanks for helping me see the (matrix) Stata code.

                    Comment


                    • #11
                      Hola Juan, lograste resolver tu duda? Tengo el mismo problema. Quiero hacer una nueva variable, que (por ejemplo) se llame: "TodasITS" y que al poner:
                      Code:
                      tab TodasITS
                      , me muestre una lista con las distintas variables ITS que tienen como respuesta==1, o sea, "Si":

                      Code:
                      ITS 1        343
                      ITS 2        199
                      ITS 3         33
                      ITS 4        200
                      Entiendo que es lo que tu querías ¿correcto? Lo estaba haciendo con "replace" pero pierdo datos en el proceso.

                      Comment

                      Working...
                      X