Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping over levelsof 4 categorical numbered variables

    Hello,

    I have a dataset with approximately 20,000 observations. 4 variables of current interest are sex(levels 1 or 2), age5cat (0/4), ethnicity (0/3), and education (0/3). I ran 2 "svy: mean variable, over(sex age5cat ethnicity education)" with the only difference being variable between the commands. This created estimates for 160 subpopulations based on the different strata for both commands. I subsequently extracted these values and divided the two matrices values using the "mata: st_matrix" command. Finally, I produced a new variable in my dataset using the "svmat" command called bmiaf. Currently, I have my dataset with its many variables and 20,000 observations for each and then the new svmat-generated bmiaf variable that only has 160 observations based on the number of strata in my svy: mean commands. I want to now truncate my dataset from 20,000 observations to only 160 and to only include the 4 strata variables (sex age5cat ethnicity education) with values that combine to represent all 160 subpopulations and which correspond to the bmiaf value in each observation (I will be merging to another dataset based on these 160 subpopulations of 4 variables).

    Code:
    keep sex age5cat ethnicity education bmiaf
    keep in 1/160
    Gives me the size of the dataset. The "svmat" command did not carry over any of the strata names, and truncating my dataset to keep the first 160 observations to fit the bmiaf observations produces a dataset with a random sample of the 160 individuals who do not represent all 160 subpopulations (and who currently have bmiaf values that are random and do not correspond to them). I now want to either recode or likely generate the "sex age5cat ethnicity education" variables to create the full list of 160 subpopulations based on every combination of these variables that also match to the bmiaf value. The order of the subpopulations was:

    Over: sex age5cat ethnicity education
    _subpop_1: Male 20-39 White <HS
    _subpop_2: Male 20-39 White HSGED
    _subpop_3: Male 20-39 White <College
    _subpop_4: Male 20-39 White >College
    _subpop_5: Male 20-39 Black <HS
    _subpop_6: Male 20-39 Black HSGED
    _subpop_7: Male 20-39 Black <College
    _subpop_8: Male 20-39 Black >College
    _subpop_9: Male 20-39 Hispanic <HS
    _subpop_10: Male 20-39 Hispanic HSGED
    _subpop_11: Male 20-39 Hispanic <College
    _subpop_12: Male 20-39 Hispanic >College
    _subpop_13: Male 20-39 Other <HS
    _subpop_14: Male 20-39 Other HSGED
    _subpop_15: Male 20-39 Other <College
    _subpop_16: Male 20-39 Other >College
    _subpop_17: Male 40-49 White <HS
    _subpop_18: Male 40-49 White HSGED
    _subpop_19: Male 40-49 White <College
    _subpop_20: Male 40-49 White >College
    _subpop_21: Male 40-49 Black <HS
    _subpop_22: Male 40-49 Black HSGED
    _subpop_23: Male 40-49 Black <College
    _subpop_24: Male 40-49 Black >College
    _subpop_25: Male 40-49 Hispanic <HS
    _subpop_26: Male 40-49 Hispanic HSGED
    _subpop_27: Male 40-49 Hispanic <College
    _subpop_28: Male 40-49 Hispanic >College
    _subpop_29: Male 40-49 Other <HS
    _subpop_30: Male 40-49 Other HSGED
    _subpop_31: Male 40-49 Other <College
    _subpop_32: Male 40-49 Other >College
    _subpop_33: Male 50-59 White <HS
    _subpop_34: Male 50-59 White HSGED
    _subpop_35: Male 50-59 White <College
    _subpop_36: Male 50-59 White >College
    _subpop_37: Male 50-59 Black <HS
    _subpop_38: Male 50-59 Black HSGED
    _subpop_39: Male 50-59 Black <College
    _subpop_40: Male 50-59 Black >College
    _subpop_41: Male 50-59 Hispanic <HS
    _subpop_42: Male 50-59 Hispanic HSGED
    _subpop_43: Male 50-59 Hispanic <College
    _subpop_44: Male 50-59 Hispanic >College
    _subpop_45: Male 50-59 Other <HS
    _subpop_46: Male 50-59 Other HSGED
    _subpop_47: Male 50-59 Other <College
    _subpop_48: Male 50-59 Other >College
    _subpop_49: Male 60-69 White <HS
    _subpop_50: Male 60-69 White HSGED
    _subpop_51: Male 60-69 White <College
    _subpop_52: Male 60-69 White >College
    _subpop_53: Male 60-69 Black <HS
    _subpop_54: Male 60-69 Black HSGED
    _subpop_55: Male 60-69 Black <College
    _subpop_56: Male 60-69 Black >College
    _subpop_57: Male 60-69 Hispanic <HS
    _subpop_58: Male 60-69 Hispanic HSGED
    _subpop_59: Male 60-69 Hispanic <College
    _subpop_60: Male 60-69 Hispanic >College
    _subpop_61: Male 60-69 Other <HS
    _subpop_62: Male 60-69 Other HSGED
    _subpop_63: Male 60-69 Other <College
    _subpop_64: Male 60-69 Other >College
    _subpop_65: Male 70+ White <HS
    _subpop_66: Male 70+ White HSGED
    _subpop_67: Male 70+ White <College
    _subpop_68: Male 70+ White >College
    _subpop_69: Male 70+ Black <HS
    _subpop_70: Male 70+ Black HSGED
    _subpop_71: Male 70+ Black <College
    _subpop_72: Male 70+ Black >College
    _subpop_73: Male 70+ Hispanic <HS
    _subpop_74: Male 70+ Hispanic HSGED
    _subpop_75: Male 70+ Hispanic <College
    _subpop_76: Male 70+ Hispanic >College
    _subpop_77: Male 70+ Other <HS
    _subpop_78: Male 70+ Other HSGED
    _subpop_79: Male 70+ Other <College
    _subpop_80: Male 70+ Other >College
    _subpop_81: Female 20-39 White <HS
    _subpop_82: Female 20-39 White HSGED
    _subpop_83: Female 20-39 White <College
    _subpop_84: Female 20-39 White >College
    _subpop_85: Female 20-39 Black <HS
    _subpop_86: Female 20-39 Black HSGED
    _subpop_87: Female 20-39 Black <College
    _subpop_88: Female 20-39 Black >College
    _subpop_89: Female 20-39 Hispanic <HS
    _subpop_90: Female 20-39 Hispanic HSGED
    _subpop_91: Female 20-39 Hispanic <College
    _subpop_92: Female 20-39 Hispanic >College
    _subpop_93: Female 20-39 Other <HS
    _subpop_94: Female 20-39 Other HSGED
    _subpop_95: Female 20-39 Other <College
    _subpop_96: Female 20-39 Other >College
    _subpop_97: Female 40-49 White <HS
    _subpop_98: Female 40-49 White HSGED
    _subpop_99: Female 40-49 White <College
    _subpop_100: Female 40-49 White >College
    _subpop_101: Female 40-49 Black <HS
    _subpop_102: Female 40-49 Black HSGED
    _subpop_103: Female 40-49 Black <College
    _subpop_104: Female 40-49 Black >College
    _subpop_105: Female 40-49 Hispanic <HS
    _subpop_106: Female 40-49 Hispanic HSGED
    _subpop_107: Female 40-49 Hispanic <College
    _subpop_108: Female 40-49 Hispanic >College
    _subpop_109: Female 40-49 Other <HS
    _subpop_110: Female 40-49 Other HSGED
    _subpop_111: Female 40-49 Other <College
    _subpop_112: Female 40-49 Other >College
    _subpop_113: Female 50-59 White <HS
    _subpop_114: Female 50-59 White HSGED
    _subpop_115: Female 50-59 White <College
    _subpop_116: Female 50-59 White >College
    _subpop_117: Female 50-59 Black <HS
    _subpop_118: Female 50-59 Black HSGED
    _subpop_119: Female 50-59 Black <College
    _subpop_120: Female 50-59 Black >College
    _subpop_121: Female 50-59 Hispanic <HS
    _subpop_122: Female 50-59 Hispanic HSGED
    _subpop_123: Female 50-59 Hispanic <College
    _subpop_124: Female 50-59 Hispanic >College
    _subpop_125: Female 50-59 Other <HS
    _subpop_126: Female 50-59 Other HSGED
    _subpop_127: Female 50-59 Other <College
    _subpop_128: Female 50-59 Other >College
    _subpop_129: Female 60-69 White <HS
    _subpop_130: Female 60-69 White HSGED
    _subpop_131: Female 60-69 White <College
    _subpop_132: Female 60-69 White >College
    _subpop_133: Female 60-69 Black <HS
    _subpop_134: Female 60-69 Black HSGED
    _subpop_135: Female 60-69 Black <College
    _subpop_136: Female 60-69 Black >College
    _subpop_137: Female 60-69 Hispanic <HS
    _subpop_138: Female 60-69 Hispanic HSGED
    _subpop_139: Female 60-69 Hispanic <College
    _subpop_140: Female 60-69 Hispanic >College
    _subpop_141: Female 60-69 Other <HS
    _subpop_142: Female 60-69 Other HSGED
    _subpop_143: Female 60-69 Other <College
    _subpop_144: Female 60-69 Other >College
    _subpop_145: Female 70+ White <HS
    _subpop_146: Female 70+ White HSGED
    _subpop_147: Female 70+ White <College
    _subpop_148: Female 70+ White >College
    _subpop_149: Female 70+ Black <HS
    _subpop_150: Female 70+ Black HSGED
    _subpop_151: Female 70+ Black <College
    _subpop_152: Female 70+ Black >College
    _subpop_153: Female 70+ Hispanic <HS
    _subpop_154: Female 70+ Hispanic HSGED
    _subpop_155: Female 70+ Hispanic <College
    _subpop_156: Female 70+ Hispanic >College
    _subpop_157: Female 70+ Other <HS
    _subpop_158: Female 70+ Other HSGED
    _subpop_159: Female 70+ Other <College
    _subpop_160: Female 70+ Other >College


    I think I would be able to generate new variables and replace them manually to accomplish this recreate these strata. However, I am wondering if I could recreate these 4 demographic variables over 160 observations using loop commands over levelsof these variables to make the coding more efficient. Based on this subpopulation list, I would like for observation 1 to have sex(1), age5cat(0), ethnicity(0), and education(0); observation 2 to have sex(1), age5cat(0), ethnicity(0), and education(1), etc. I am not great with the loop coding and I am having some difficulties. I would appreciate any help if this is possible. This is what I came up with but not sure if this is anywhere near correct on Stata 13. I am not sure how to tie the "replace" commands together such that say a generated education value is tied to generated ethnicity, age5cat, and sex values:


    Code:
    levelsof sex, local(sexlevels)
    gen sex2 = .
    levelsof age5cat, local(age5catlevels)
    gen age5cat2 = .
    levelsof ethnicity, local(ethnicitylevels)
    gen ethnicity2 = .
    levelsof education, local(educationlevels)
    gen education2 = .
    
        forval i = 1/160 {
            foreach x in varlist sex2 age5cat2 ethnicity2 education2 {
                foreach y of local sexlevels {
                    replace sex2=`sexlevels' if `x' ==`y' &
                    foreach y of local age5catlevels {
                        replace age5cat2=`age5catlevels' if `x' ==`y' &
                        foreach y of local ethnicitylevels {
                            replace ethnicity2=`ethnicitylevels' if `x' ==`y' &
                            foreach y of local educationlevels {
                                replace education2=`educationlevels' if `x' ==`y'
                                local ++i
                            }
                        }
                    }
                }
            }
        }

    Which produces:
    invalid '2'
    r(198);



  • #2
    I interpret your description as starting with some survey-structured data, calculating means of two variables over 4 different categorical variables, and taking the ratio of those means. If your survey design uses pweights, then there is a simpler way to do this over from scratch, rather than trying to fix the awkward data set you have created with svmat. You do not provide example data, but I'll illustrate the approach I would use with the -nhanes2.dta- provided on the StataCorp website, which is probably similar enough to your own to provide a template for you.


    Code:
    webuse nhanes2, clear
    
    svyset // EXHIBIT THE SURVEY DESIGN PARAMETERS
    assert `"`r(wtype)'"' == "pweight" // VERIFY ONLY pweights AT ONE LEVEL ARE NEEDED
    
    collapse (mean) bpsystol bpdiast [pweight = `r(wvar)'], by(sex race agegrp rural)
    gen ratio = bpsystol/bpdiast
    drop bpsystol bpdiast

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I interpret your description as starting with some survey-structured data, calculating means of two variables over 4 different categorical variables, and taking the ratio of those means. If your survey design uses pweights, then there is a simpler way to do this over from scratch, rather than trying to fix the awkward data set you have created with svmat. You do not provide example data, but I'll illustrate the approach I would use with the -nhanes2.dta- provided on the StataCorp website, which is probably similar enough to your own to provide a template for you.


      Code:
      webuse nhanes2, clear
      
      svyset // EXHIBIT THE SURVEY DESIGN PARAMETERS
      assert `"`r(wtype)'"' == "pweight" // VERIFY ONLY pweights AT ONE LEVEL ARE NEEDED
      
      collapse (mean) bpsystol bpdiast [pweight = `r(wvar)'], by(sex race agegrp rural)
      gen ratio = bpsystol/bpdiast
      drop bpsystol bpdiast
      Clyde this absolutely did the trick and in an infinitely easier way! I am in fact using NHANES data so it applied directly. The collapse command worked great and then the ability to just generate the ratio variable what the simple solution. Thank you so much.

      To add, if I tried to continue solving my own issue, I see that I was heavily complicating my life trying to create these loops. Using Nick Cox egen commands to create a variable that contains a repeating sequence of numbers (https://www.stata.com/support/faqs/d...es-of-numbers/) also allowed manually generating these variables with ease.
      Last edited by Bart Orzel; 16 Dec 2021, 08:42.

      Comment

      Working...
      X