Adjusting for missing observations in a weighted standardized index

Keanan Gleason

Join Date: Nov 2018
Posts: 8

Adjusting for missing observations in a weighted standardized index

17 Nov 2019, 10:30

Hi,

I'm working with individual-level pooled cross-sectional National Health Interview Survey (NHIS) survey data across years 1981-2014. Since a lot of the questions in the survey fall under the same broader categories (e.g. physical health, mental health, health care/insurance, etc.), I'd like to group them into indices. Following Thompson (2018), for each index I standardize the components to have a mean of zero and a standard deviation of one. I then create weights equal to the inverse of the sample covariance and use them to weight the mean of the standardized components. Here's a data example of what the standardized components look like:

Code:

input float(aeffortzscore ahopelesszscore anervouszscore arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore)
 -.5429977   .3381291   .6184267  -.4203467  -.6699646  .29255468         .
  .4963797   .3381291   .6184267 -2.4615376  .51620674  .29255468 1.0767654
 -.5429977   .3381291   .6184267   .6002487  .51620674  .29255468         .
  -3.66113   .3381291  -3.675016  -3.482133  .51620674  .29255468  .1148366
 -1.582375   .3381291   .6184267   .6002487  -.6699646  .29255468  .1148366
  .4963797   .3381291   .6184267   .6002487  .51620674  .29255468         .
  .4963797   .3381291   .6184267  -.4203467  -.6699646  .29255468         .
  .4963797   .3381291   .6184267   .6002487  .51620674  .29255468         .
  .4963797   .3381291 -1.5282946  -1.440942  -.6699646  .29255468 1.0767654
 -1.582375  -2.590604 -1.5282946  -1.440942  -.6699646  .29255468  .1148366
  .4963797   .3381291   .6184267   .6002487  .51620674  .29255468         .
 -.5429977 -1.1262374  -.4549339  -1.440942  -1.856136  .29255468 -.8470922
 -1.582375  -2.590604   .6184267   .6002487  -.6699646  .29255468 -.8470922

and here's the code I use to create the index, starting with the command to generate the standardized components shown in the above data example:

Code:

*create mental health index

*standardize components
#delimit ;
foreach var in aeffort ahopeless anervous arestless asad aworthless feelings_interfered { ;
    egen `var'zscore=std(`var') ;
} ;

matrix drop _all ;
*calculate weights ;
local mental_health_vars "aeffortzscore ahopelesszscore anervouszscore 
arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore" ;
corr `mental_health_vars' ;
#delimit cr

mat sigma=r(C)
foreach n of numlist 1/7 {
    mat c_`n' = sigma[`n',1..7]
    mat XX = c_`n'
    svmat XX
    scalar w`n' = XX1+ XX2 + XX3 + XX4 + XX5 + XX6 + XX7
    drop XX*
}

*weight outcomes
local num = 1
foreach var    in `mental_health_vars' {
g tmp`num' = `var'*w`num'
local num = `num' + 1
}

*take mean of weighted outcomes
egen tmpcomp = rowtotal(tmp1 tmp2 tmp3 tmp4 tmp5 tmp6 tmp7), mis
gen W=w1+w2+w3+w4+w5+w6+w7
replace tmpcomp=tmpcomp/W

*restandardize
egen mental_health_index = std(tmpcomp)
replace mental_health_index = round(mental_health_index,.02)
label var mental_health_index "mental health index"
capture drop tmp* W

My question relates to adjusting the index for individuals with missing values for one or more of the components. By including the option ", mis" when calculating the row total of the weighted outcomes, I'm able to account for missings (i.e. stata treats missing values as missing and not zero). However, I'm concerned that when dividing by the sum of the weights, I include seven weights regardless of the number of missings for each individual. For example, an individual with a missing value for one of the questions should have six weights, but it's unclear to me whether that is indeed the case or whether a seventh weight is still being included in the sum and consequently diving the total of six standardized components by seven weights. Since the weights are calculated as scalars, I cannot (to the best of my knowledge) examine them to determine whether they are missing when they should be.

I would appreciate any advice for how to verify whether the above code appropriately adjusts for missing values in the denominator of the weighted average index. I'm using Stata 15.1.

Thanks,

Keanan

Tags: None

Announcement

Adjusting for missing observations in a weighted standardized index