Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adjusting for missing observations in a weighted standardized index

    Hi,

    I'm working with individual-level pooled cross-sectional National Health Interview Survey (NHIS) survey data across years 1981-2014. Since a lot of the questions in the survey fall under the same broader categories (e.g. physical health, mental health, health care/insurance, etc.), I'd like to group them into indices. Following Thompson (2018), for each index I standardize the components to have a mean of zero and a standard deviation of one. I then create weights equal to the inverse of the sample covariance and use them to weight the mean of the standardized components. Here's a data example of what the standardized components look like:
    Code:
    input float(aeffortzscore ahopelesszscore anervouszscore arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore)
     -.5429977   .3381291   .6184267  -.4203467  -.6699646  .29255468         .
      .4963797   .3381291   .6184267 -2.4615376  .51620674  .29255468 1.0767654
     -.5429977   .3381291   .6184267   .6002487  .51620674  .29255468         .
      -3.66113   .3381291  -3.675016  -3.482133  .51620674  .29255468  .1148366
     -1.582375   .3381291   .6184267   .6002487  -.6699646  .29255468  .1148366
      .4963797   .3381291   .6184267   .6002487  .51620674  .29255468         .
      .4963797   .3381291   .6184267  -.4203467  -.6699646  .29255468         .
      .4963797   .3381291   .6184267   .6002487  .51620674  .29255468         .
      .4963797   .3381291 -1.5282946  -1.440942  -.6699646  .29255468 1.0767654
     -1.582375  -2.590604 -1.5282946  -1.440942  -.6699646  .29255468  .1148366
      .4963797   .3381291   .6184267   .6002487  .51620674  .29255468         .
     -.5429977 -1.1262374  -.4549339  -1.440942  -1.856136  .29255468 -.8470922
     -1.582375  -2.590604   .6184267   .6002487  -.6699646  .29255468 -.8470922
    and here's the code I use to create the index, starting with the command to generate the standardized components shown in the above data example:
    Code:
    *create mental health index
    
    *standardize components
    #delimit ;
    foreach var in aeffort ahopeless anervous arestless asad aworthless feelings_interfered { ;
        egen `var'zscore=std(`var') ;
    } ;
    
    matrix drop _all ;
    *calculate weights ;
    local mental_health_vars "aeffortzscore ahopelesszscore anervouszscore 
    arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore" ;
    corr `mental_health_vars' ;
    #delimit cr
    
    mat sigma=r(C)
    foreach n of numlist 1/7 {
        mat c_`n' = sigma[`n',1..7]
        mat XX = c_`n'
        svmat XX
        scalar w`n' = XX1+ XX2 + XX3 + XX4 + XX5 + XX6 + XX7
        drop XX*
    }
    
    *weight outcomes
    local num = 1
    foreach var    in `mental_health_vars' {
    g tmp`num' = `var'*w`num'
    local num = `num' + 1
    }
    
    *take mean of weighted outcomes
    egen tmpcomp = rowtotal(tmp1 tmp2 tmp3 tmp4 tmp5 tmp6 tmp7), mis
    gen W=w1+w2+w3+w4+w5+w6+w7
    replace tmpcomp=tmpcomp/W
    
    *restandardize
    egen mental_health_index = std(tmpcomp)
    replace mental_health_index = round(mental_health_index,.02)
    label var mental_health_index "mental health index"
    capture drop tmp* W
    My question relates to adjusting the index for individuals with missing values for one or more of the components. By including the option ", mis" when calculating the row total of the weighted outcomes, I'm able to account for missings (i.e. stata treats missing values as missing and not zero). However, I'm concerned that when dividing by the sum of the weights, I include seven weights regardless of the number of missings for each individual. For example, an individual with a missing value for one of the questions should have six weights, but it's unclear to me whether that is indeed the case or whether a seventh weight is still being included in the sum and consequently diving the total of six standardized components by seven weights. Since the weights are calculated as scalars, I cannot (to the best of my knowledge) examine them to determine whether they are missing when they should be.

    I would appreciate any advice for how to verify whether the above code appropriately adjusts for missing values in the denominator of the weighted average index. I'm using Stata 15.1.

    Thanks,

    Keanan


Working...
X