Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • NHANES data sets: Combining multiple categorical variables into a single indicator or categorical variable

    Hi,

    I was wondering if you have information about how to use Stata to combine the observations from multiple categorical variables into a single indicator variable, e.g condition present yes or no.

    I am working with NHANES data 2011-2012. http://wwwn.cdc.gov/nchs/nhanes/search/nhanes11_12.aspx

    I need to combine the following categorical variables < ohx02se ohx03se..... ohx30se > into a single indicator/binary or even categorical variable, lets call it Sealants_Not_Present. I am trying to create a single variable that includes the sum of all the "0" (means sealant not present) from each of the 30 distinct variables. (see codebook below)

    Your help is deeply appreciated,
    Victor

    describe ohx02se ohx03se ohx04se seqn

    storage display value
    variable name type format label variable label
    -----------------------------------------------------------------------------------------------------------------
    ohx02se str10 %10s Dental Sealants: #2
    ohx03se str10 %10s Dental Sealants: #3
    ohx04se str10 %10s Dental Sealants: #4
    seqn double %10.0g Respondent sequence number


    Codebook for ohx02se

    ohx02se Dental Sealants: #2
    -----------------------------------------------------------------------------------------------------------------

    type: string (str10), but longest is str2

    unique values: 4 missing "": 6732/9756

    tabulation: Freq. Value
    6732 ""
    1008 "0"
    153 "1"
    15 "13"
    1848 "9"

  • #2
    I don't understand what you want to do. The variables ohx*se are apparently string variables that contain codes whose meaning you do not explain. How does one decide on the basis of this information whether sealants are "not present"? Is it when all of the ohx*se variables are missing? Or when they all are "0" or some other rule.

    I imagine that if you can explain that it will be easy for someone to help you construct your variable.

    Comment


    • #3
      I've looked at the NHANES codebook for the set you are using (which is http://wwwn.cdc.gov/nchs/nhanes/2011-2012/OHXDEN_G.htm)

      As Clyde points out the first step is to destring the variables (i.e. make them into numeric variables)
      Then you can generate your new variable with help of "egen"
      From the codebook I gather that all 17 (not 30!) variables are coded
      0 "Sealant not present"
      1-4 : some kind & amount of sealant present
      9: Cannot be assessed (why?)
      12-13: value recorded


      Please note that there are a lot of missing values on these 17 variables.
      Of the total sample of 8956 around 5938 cases per variable have a missing value. (you should look at the NHANES documentation to determine why this is)
      And even out of the 3000 or so remaining respondents about 60% have scored 'could not be assessed'
      That raises the question what the score on your nosealant variable means.
      At the very least you will need to exclude 5931 cases that have no values recorded for any of the 17 variables.
      But even then a score of e.g. '4' can mean different things depending on how many assessments could be made; ie. 4 times no sealant out of 4 assessments is not the same as 4 times no sealant out of 17 assessments.

      Here is the code to get you started

      // destring
      destring ohx02se-ohx30se, replace
      //generate a variable to identify case with missings on all 17 variables
      egen noinfo=rowmiss(ohx02se-ohx30se)
      //generate a variable nosealant that counts the number of times sealant is absent (ie when it equals 0)
      //excluding cases where no data was collected
      egen nosealant=anycount(ohx02se-ohx30se) if noinfo<17, values(0)
      Last edited by Evelyn Ersanilli; 13 Sep 2014, 20:22.

      Comment


      • #4
        Wow, thank you very much for your help. I really appreciate your input. I will give it a try and report back on my progress.
        Victor

        Comment

        Working...
        X