Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining two categorical variables when record ids overlap

    I am using restricted survey data for my dissertation and I am fairly new to stata. I am trying to combine two categorical variables that are the following format:

    VarA
    0 = no
    1 = yes
    6 = missing
    9 = not stated

    VarB
    0 = not
    1 = yes
    6 = missing
    9 = not stated

    I want to combine them into VarC where:
    0 = all instances where they said no (0) to VarA and/or VarB
    1 = all instances where they said yes (1) to VarA and/or VarB
    6 = all missing (6) or not stated(9)

    The issue I'm having is that in a few circumstances, respondents answered no to both questions, so when I combine the variables, Stata will only count those frequencies once. Is there a way to count all the frequencies, despite them coming from the same respondent? I have asked several people, and consulted A Gentle Introduction to Stata, countless forum posts and the help command but cannot find an answer.

    I have provided my syntax, sample data and frequency tables below. Feel free to let me know if you require anything else (like I said I'm new to this, so I referenced the guidelines but happy to learn etiquette).


    Here is a copy of my dofile:

    Code:
    generate noharm =1 if VarA ==0
    generate harm =1 if VarA ==1
    generate cs6 =1 if VarA ==6
    generate cs9 =1 if VarA ==9
    
    generate exnoharm =1 if VarB ==0
    generate exharm =1 if VarB ==1
    generate ex6 =1 if VarB ==6
    generate ex9 =1 if VarB ==9    
    
    gen VarC = 0
    
    replace VarC =0 if noharm ==1 & exnoharm ==1
    replace VarC =0 if noharm ==1 & ex6 ==1
    replace VarC =0 if noharm ==1 & ex9 ==1
    replace VarC =0 if noharm ==1 & exnoharm ==1
    replace VarC =0 if cs6 ==1 & exnoharm ==1
    replace VarC =0 if cs9 ==1 & exnoharm ==1
    
    replace VarC =1 if harm ==1 & exharm ==1
    replace VarC =1 if harm ==1 & ex6 ==1
    replace VarC =1 if harm ==1 & ex9 ==1
    replace VarC =1 if harm ==1 & exharm ==1
    replace VarC =1 if cs6 ==1 & exharm ==1
    replace VarC =1 if cs9 ==1 & exharm ==1
    
    replace VarC =6 if cs6 ==1 & ex6 ==1
    replace VarC =6 if cs6 ==1 & ex9 ==1
    replace VarC =6 if cs9 ==1 & ex6 ==1
    replace VarC =6 if cs9 ==1 & ex9 ==1

    I have provided a fake dataset below (produced with Stata 18 but in the lab I use Stata 16)

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(VarA VarB id noharm harm cs6 cs9 exnoharm exharm ex6 ex9 VarC)
    0 0   1 1 . . . 1 . . . 0
    0 0   2 1 . . . 1 . . . 0
    6 0   3 . . 1 . 1 . . . 0
    6 0   4 . . 1 . 1 . . . 0
    6 0   5 . . 1 . 1 . . . 0
    6 0   6 . . 1 . 1 . . . 0
    6 0   7 . . 1 . 1 . . . 0
    6 0   8 . . 1 . 1 . . . 0
    6 0   9 . . 1 . 1 . . . 0
    6 0  10 . . 1 . 1 . . . 0
    6 0  11 . . 1 . 1 . . . 0
    6 0  12 . . 1 . 1 . . . 0
    6 0  13 . . 1 . 1 . . . 0
    6 0  14 . . 1 . 1 . . . 0
    6 0  15 . . 1 . 1 . . . 0
    6 0  16 . . 1 . 1 . . . 0
    6 0  17 . . 1 . 1 . . . 0
    6 0  18 . . 1 . 1 . . . 0
    6 0  19 . . 1 . 1 . . . 0
    6 0  20 . . 1 . 1 . . . 0
    6 0  21 . . 1 . 1 . . . 0
    6 0  22 . . 1 . 1 . . . 0
    6 0  23 . . 1 . 1 . . . 0
    6 0  24 . . 1 . 1 . . . 0
    6 0  25 . . 1 . 1 . . . 0
    6 0  26 . . 1 . 1 . . . 0
    6 0  27 . . 1 . 1 . . . 0
    1 6  28 . 1 . . . . 1 . 1
    6 0  29 . . 1 . 1 . . . 0
    6 0  30 . . 1 . 1 . . . 0
    9 1  31 . . . 1 . 1 . . 1
    6 1  32 . . 1 . . 1 . . 1
    6 1  33 . . 1 . . 1 . . 1
    6 1  34 . . 1 . . 1 . . 1
    6 1  35 . . 1 . . 1 . . 1
    0 6  36 1 . . . . . 1 . 0
    0 6  37 1 . . . . . 1 . 0
    0 6  38 1 . . . . . 1 . 0
    0 6  39 1 . . . . . 1 . 0
    0 6  40 1 . . . . . 1 . 0
    0 6  41 1 . . . . . 1 . 0
    0 6  42 1 . . . . . 1 . 0
    0 6  43 1 . . . . . 1 . 0
    0 6  44 1 . . . . . 1 . 0
    0 6  45 1 . . . . . 1 . 0
    0 6  46 1 . . . . . 1 . 0
    0 6  47 1 . . . . . 1 . 0
    0 6  48 1 . . . . . 1 . 0
    0 6  49 1 . . . . . 1 . 0
    0 6  50 1 . . . . . 1 . 0
    0 6  51 1 . . . . . 1 . 0
    0 6  52 1 . . . . . 1 . 0
    0 6  53 1 . . . . . 1 . 0
    6 6  54 . . 1 . . . 1 . 6
    6 6  55 . . 1 . . . 1 . 6
    6 6  56 . . 1 . . . 1 . 6
    6 6  57 . . 1 . . . 1 . 6
    6 6  58 . . 1 . . . 1 . 6
    6 6  59 . . 1 . . . 1 . 6
    6 6  60 . . 1 . . . 1 . 6
    6 6  61 . . 1 . . . 1 . 6
    6 6  62 . . 1 . . . 1 . 6
    6 6  63 . . 1 . . . 1 . 6
    6 6  64 . . 1 . . . 1 . 6
    6 6  65 . . 1 . . . 1 . 6
    6 6  66 . . 1 . . . 1 . 6
    6 6  67 . . 1 . . . 1 . 6
    6 6  68 . . 1 . . . 1 . 6
    6 6  69 . . 1 . . . 1 . 6
    6 6  70 . . 1 . . . 1 . 6
    6 6  71 . . 1 . . . 1 . 6
    6 6  72 . . 1 . . . 1 . 6
    6 6  73 . . 1 . . . 1 . 6
    6 6  74 . . 1 . . . 1 . 6
    6 6  75 . . 1 . . . 1 . 6
    6 6  76 . . 1 . . . 1 . 6
    6 6  77 . . 1 . . . 1 . 6
    6 6  78 . . 1 . . . 1 . 6
    6 6  79 . . 1 . . . 1 . 6
    6 6  80 . . 1 . . . 1 . 6
    6 6  81 . . 1 . . . 1 . 6
    6 6  82 . . 1 . . . 1 . 6
    6 6  83 . . 1 . . . 1 . 6
    6 6  84 . . 1 . . . 1 . 6
    6 6  85 . . 1 . . . 1 . 6
    6 6  86 . . 1 . . . 1 . 6
    6 6  87 . . 1 . . . 1 . 6
    6 6  88 . . 1 . . . 1 . 6
    6 6  89 . . 1 . . . 1 . 6
    6 6  90 . . 1 . . . 1 . 6
    6 6  91 . . 1 . . . 1 . 6
    6 6  92 . . 1 . . . 1 . 6
    6 6  93 . . 1 . . . 1 . 6
    6 6  94 . . 1 . . . 1 . 6
    6 6  95 . . 1 . . . 1 . 6
    6 6  96 . . 1 . . . 1 . 6
    6 6  97 . . 1 . . . 1 . 6
    6 6  98 . . 1 . . . 1 . 6
    6 6  99 . . 1 . . . 1 . 6
    6 6 100 . . 1 . . . 1 . 6
    6 6 101 . . 1 . . . 1 . 6
    6 6 102 . . 1 . . . 1 . 6
    6 6 103 . . 1 . . . 1 . 6
    6 6 104 . . 1 . . . 1 . 6
    6 6 105 . . 1 . . . 1 . 6
    6 6 106 . . 1 . . . 1 . 6
    6 6 107 . . 1 . . . 1 . 6
    6 6 108 . . 1 . . . 1 . 6
    6 6 109 . . 1 . . . 1 . 6
    6 6 110 . . 1 . . . 1 . 6
    6 6 111 . . 1 . . . 1 . 6
    6 6 112 . . 1 . . . 1 . 6
    6 6 113 . . 1 . . . 1 . 6
    6 6 114 . . 1 . . . 1 . 6
    6 6 115 . . 1 . . . 1 . 6
    6 6 116 . . 1 . . . 1 . 6
    6 6 117 . . 1 . . . 1 . 6
    6 6 118 . . 1 . . . 1 . 6
    6 6 119 . . 1 . . . 1 . 6
    6 6 120 . . 1 . . . 1 . 6
    6 6 121 . . 1 . . . 1 . 6
    6 6 122 . . 1 . . . 1 . 6
    6 6 123 . . 1 . . . 1 . 6
    6 6 124 . . 1 . . . 1 . 6
    6 6 125 . . 1 . . . 1 . 6
    6 6 126 . . 1 . . . 1 . 6
    6 6 127 . . 1 . . . 1 . 6
    6 6 128 . . 1 . . . 1 . 6
    6 6 129 . . 1 . . . 1 . 6
    6 6 130 . . 1 . . . 1 . 6
    6 6 131 . . 1 . . . 1 . 6
    6 6 132 . . 1 . . . 1 . 6
    6 6 133 . . 1 . . . 1 . 6
    6 6 134 . . 1 . . . 1 . 6
    6 6 135 . . 1 . . . 1 . 6
    6 6 136 . . 1 . . . 1 . 6
    6 6 137 . . 1 . . . 1 . 6
    6 6 138 . . 1 . . . 1 . 6
    6 6 139 . . 1 . . . 1 . 6
    6 6 140 . . 1 . . . 1 . 6
    6 6 141 . . 1 . . . 1 . 6
    6 6 142 . . 1 . . . 1 . 6
    6 6 143 . . 1 . . . 1 . 6
    6 6 144 . . 1 . . . 1 . 6
    6 6 145 . . 1 . . . 1 . 6
    6 6 146 . . 1 . . . 1 . 6
    6 6 147 . . 1 . . . 1 . 6
    6 6 148 . . 1 . . . 1 . 6
    6 6 149 . . 1 . . . 1 . 6
    6 6 150 . . 1 . . . 1 . 6
    6 6 151 . . 1 . . . 1 . 6
    6 6 152 . . 1 . . . 1 . 6
    6 6 153 . . 1 . . . 1 . 6
    6 6 154 . . 1 . . . 1 . 6
    6 6 155 . . 1 . . . 1 . 6
    6 6 156 . . 1 . . . 1 . 6
    6 6 157 . . 1 . . . 1 . 6
    6 6 158 . . 1 . . . 1 . 6
    6 6 159 . . 1 . . . 1 . 6
    6 6 160 . . 1 . . . 1 . 6
    6 6 161 . . 1 . . . 1 . 6
    6 6 162 . . 1 . . . 1 . 6
    6 6 163 . . 1 . . . 1 . 6
    6 6 164 . . 1 . . . 1 . 6
    6 6 165 . . 1 . . . 1 . 6
    6 6 166 . . 1 . . . 1 . 6
    6 6 167 . . 1 . . . 1 . 6
    6 6 168 . . 1 . . . 1 . 6
    6 6 169 . . 1 . . . 1 . 6
    6 6 170 . . 1 . . . 1 . 6
    6 6 171 . . 1 . . . 1 . 6
    6 6 172 . . 1 . . . 1 . 6
    6 6 173 . . 1 . . . 1 . 6
    6 6 174 . . 1 . . . 1 . 6
    6 6 175 . . 1 . . . 1 . 6
    6 6 176 . . 1 . . . 1 . 6
    6 6 177 . . 1 . . . 1 . 6
    6 6 178 . . 1 . . . 1 . 6
    6 6 179 . . 1 . . . 1 . 6
    6 6 180 . . 1 . . . 1 . 6
    6 6 181 . . 1 . . . 1 . 6
    6 6 182 . . 1 . . . 1 . 6
    6 6 183 . . 1 . . . 1 . 6
    6 6 184 . . 1 . . . 1 . 6
    6 6 185 . . 1 . . . 1 . 6
    6 6 186 . . 1 . . . 1 . 6
    6 6 187 . . 1 . . . 1 . 6
    6 6 188 . . 1 . . . 1 . 6
    6 6 189 . . 1 . . . 1 . 6
    6 6 190 . . 1 . . . 1 . 6
    6 6 191 . . 1 . . . 1 . 6
    6 6 192 . . 1 . . . 1 . 6
    6 6 193 . . 1 . . . 1 . 6
    6 6 194 . . 1 . . . 1 . 6
    6 6 195 . . 1 . . . 1 . 6
    6 6 196 . . 1 . . . 1 . 6
    6 6 197 . . 1 . . . 1 . 6
    6 6 198 . . 1 . . . 1 . 6
    6 6 199 . . 1 . . . 1 . 6
    6 6 200 . . 1 . . . 1 . 6
    6 6 201 . . 1 . . . 1 . 6
    6 6 202 . . 1 . . . 1 . 6
    6 6 203 . . 1 . . . 1 . 6
    6 6 204 . . 1 . . . 1 . 6
    6 6 205 . . 1 . . . 1 . 6
    6 6 206 . . 1 . . . 1 . 6
    6 6 207 . . 1 . . . 1 . 6
    6 6 208 . . 1 . . . 1 . 6
    6 6 209 . . 1 . . . 1 . 6
    6 6 210 . . 1 . . . 1 . 6
    6 6 211 . . 1 . . . 1 . 6
    6 6 212 . . 1 . . . 1 . 6
    6 6 213 . . 1 . . . 1 . 6
    6 6 214 . . 1 . . . 1 . 6
    6 6 215 . . 1 . . . 1 . 6
    6 6 216 . . 1 . . . 1 . 6
    6 6 217 . . 1 . . . 1 . 6
    6 6 218 . . 1 . . . 1 . 6
    6 6 219 . . 1 . . . 1 . 6
    9 9 220 . . . 1 . . . 1 6
    end
    Here are copies of my cross tabs and frequency tables for reference as well:

    VarB
    VarA 0 1 6 9 Total
    0 2 0 18 0 20
    1 0 0 1 0 1
    6 27 4 166 0 197
    9 0 1 0 1 2
    Total 29 5 185 1 220
    VarC Freq. Percent Cum.
    0 47 21.36 21.36
    1 6 2.73 24.09
    6 167 75.91 100.00
    Total 220 100

  • #2
    You are only using an & operator in your if statements. You should be using an '|' operator (above the backward slash keyboard) which covers and/or. I hope this answers your question

    See help if


    . sysuse auto
    . list make mpg if mpg>25
    . list make mpg if mpg>25 & mpg<30
    . list make mpg if mpg>25 | mpg<10
    . regress mpg weight displ if foreign==1


    Comment


    • #3
      Hi thanks for your response. When I used the line, it only replaced 3 values for 0 and then the rest were assigned to missing. It didn't assign any to 1. Do you know why that may be?

      Comment


      • #4
        Just for reference - this is what happens when I use "|" instead of *&*

        Code:
        gen VarD = 0
        replace VarD =0 if noharm ==1 | exnoharm ==1
        replace VarD =0 if noharm ==1 | ex6 ==1 
        replace VarD =0 if noharm ==1 | ex9 ==1
        replace VarD =0 if noharm ==1 | exnoharm ==1
        replace VarD =0 if cs6 ==1 | exnoharm ==1
        replace VarD =0 if cs9 ==1 | exnoharm ==1
        
        replace VarD =1 if harm ==1 | exharm ==1
        replace VarD =1 if harm ==1 | ex6 ==1 
        replace VarD =1 if harm ==1 | ex9 ==1
        replace VarD =1 if harm ==1 | exharm ==1
        replace VarD =1 if cs6 ==1 | exharm ==1
        replace VarD =1 if cs9 ==1 | exharm ==1
        
        replace VarD =6 if cs6 ==1 | ex6 ==1
        replace VarD =6 if cs6 ==1 | ex9 ==1
        replace VarD =6 if cs9 ==1 | ex6 ==1
        replace VarD =6 if cs9 ==1 | ex9 ==1
        VarD Freq Percent Cum
        0 2 0.91 0.91
        6 218 99.09 100.00
        Total 210 100.00
        Thanks again!

        Comment


        • #5
          Your original question is ill-posed and there is no solution. The problem is that according to your specification
          I want to combine them into VarC where:
          0 = all instances where they said no (0) to VarA and/or VarB
          1 = all instances where they said yes (1) to VarA and/or VarB
          6 = all missing (6) or not stated(9)
          if a person responds 0 to one of VarA or VarB and 1 to the other, it meets the criteria for both VarC = 0 and VarC = 1. Since you have not provided a consistent definition for VarC, no code can possibly work. I suggest that rather than trying to specify in words what VarC is supposed to look like, I suggest you post back with a table like this:
          Var B Var A: 0 1 6 or 9
          0
          1
          6 or 9
          and fill in the cells with what you want the value of VarC to be for the corresponding values of VarA and VarB. Once there is clarity on what is wanted, I'm confident it will be easy to write a few lines of code that handle it.

          Comment


          • #6
            Sorry about that. Here is what I'm looking for:
            VarB Var A 0 1 6 or 9
            0 0 1 0
            1 1 1 1
            6 or 9 0 1 6
            Thank you!

            Comment


            • #7
              Code:
              gen varC = inlist(1, varA, varB)
              replace varCV = 6 if inlist(varA, 6, 9) & inlist(varB, 6, 9)

              Comment


              • #8
                Thank you for providing this code.

                So it's a much better way to get to my result, however, I am wonder if it's possible for Stata to count the same record twice, if they responded to 0 both times.


                ​​​​​​So what happens when I combine the variables is that I start out with these numbers:

                0 1 6 9 Total
                0 2 0 18 0 20
                1 0 0 1 0 1
                6 27 4 166 0 197
                9 0 1 0 1 2
                Total 29 5 185 1 220

                So I am expecting 0 to equal 49 respondents instead when I recieved the following numbers;

                0 = 47 respondents
                1 = 6 respondents
                6 = 167 respondents

                Is there anyway to count the same record twice?

                Thanks!

                Comment


                • #9
                  Code:
                  gen `c(obs_t)' obs_no = _n
                  reshape long Var, i(obs_no) j(_j) string
                  tab Var if inlist(_j, "A", "B")

                  Comment


                  • #10
                    Thank you!

                    Comment

                    Working...
                    X