How to impute missing values for a variable that is subsequently used to create an outcome index?

Jin Russell

Join Date: Jan 2016

Posts: 15
#1

How to impute missing values for a variable that is subsequently used to create an outcome index?

04 Apr 2016, 22:38

I am struggling to see how to obtain a pooled estimate for a variable after multiple imputation that can be worked with in further data manipulation for the original dataset.

I am using a large child health cohort dataset (5737 cases in total) to compile an index of chronic conditions. The chronic condition index is calculated for each of the children by summing the total number of chronic conditions that each child has - for instance, 1 point for asthma, 1 point for eczema, so on. One of the variables that I would like to include in the index is BMI-for-age expressed as a Z-score. This variable, "ZBMI", is missing for 20% of my cases. The other variables that I am using in the index have complete cases.

I would like to use multiple imputation to fill in the missing values for BMI, and then to go on to work with a pooled BMI variable. For instance, I would like to use the pooled BMI variable to then create a dichotomous indicator for obesity (each child will be 'obese' or 'not obese') and then to ultimately incorporate this marker of obesity into the chronic condition index.

I can't see how to obtain an imputed BMI variable after imputing that I can then use for my n = 5737 dataset.
I can see how regress immediately after imputing using the command "mi estimate:" but this command doesn't seem to allow me to manipulate the data in the way that I would like to:

When I do this:

mi impute mvn ZBMI_Y2CO = asthmacc_y2cm eczemacc_y2cm sepi2_am male_pdl, add(20) rseed(1234)

gen obese = ZBMI_Y2CO > 2

tab obese

I get this output:

obese | Freq. Percent Cum.
------------+-----------------------------------
0 | 19,447 66.47 66.47
1 | 9,810 33.53 100.00
------------+-----------------------------------
Total | 29,257 100.00

But I don't want a total of 29,257; I want to get back to n = 5737 and have the pooled variable to work with in my original dataset.

I would appreciate any advice on this pickle!

Warmly

Jin

Last edited by Jin Russell; 04 Apr 2016, 22:41.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3824
#2

05 Apr 2016, 00:43

I am struggling to see how to obtain a pooled estimate for a variable after multiple imputation that can be worked with in further data manipulation for the original dataset.
[...]
I would like to use multiple imputation to fill in the missing values for BMI, and then to go on to work with a pooled BMI variable. For instance, I would like to use the pooled BMI variable to then create a dichotomous indicator for obesity (each child will be 'obese' or 'not obese') and then to ultimately incorporate this marker of obesity into the chronic condition index.

Why would you want this? The hole point of multiple imputation is to have multiple imputed values, not one. By using one value you do not reflect the uncertainty that is associated with this estimate. Instead you treat this value as if you had observed it.

If you really want this, my advice is to take any of the imputed values as each will be as good as any of the others. There is no point in pooling anything here.

I cannot say more, as I have no idea what you want to do with those indices once you have created them. If you plan on running analysis where statistical inference plays a role, then the latter will be "wrong" when using imputed values just as if they had been observed.

Best
Daniel
1 like
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#3

05 Apr 2016, 00:49

Dear Jin, I strongly recommend to read the Stata PDF manual on MI. From your post it is unclear whether BMI, for example, is already a construct index (based on weight and height) or a given variable. If you've created this variable then maybe you should impute each of the original parts of this variable before. If this is a unique variable which you haven't compute by yourself then you should follow the mi passive help. The standardization part (the Z score) is already a manipulation so I, personally, will impute the bmi first and then standardize it. To tabulate each dataset you can use the mi xeq condition or remember always to tabulate with two-way table with the _mi_m variable as the first variable.
see this related example:

Code:

//use some data and register the imputed variables webuse mheart5 mi set flong mi register imputed age bmi set seed 29390 mi impute mvn age bmi = attack smokes hsgrad female, add(10) //generate the Z-score on every imputed data mi xeq: egen ZBMI= std(bmi) //gen the obese variable mi passive: gen obese = ZBMI > 2 //Tabulate your results for each of the mi datasets mi xeq: tabulate obese //OR tab _mi_m obese
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

05 Apr 2016, 09:45

Oded Mcdossi I can't say for sure what Jin Russell's zbmi variable is, but it is pretty standard practice in children's health data to use a z-score for body mass index that is standardized not with respect to the particular data set you are working with but to a standard reference population. There are both national and international standards compiled for these purposes, and my guess is that his variable is based on one of those. So using -egen- on his own data probably will not get him the right construct. There are also several user-written Stata programs available to calculate these from height weight and age data. Two that I have personally used are -zanthro- (available from SSC) and -igrowup- which can be downloaded from the World Health Organization's website. (I like -zanthro- better, FWIW.)

Correction added: -zanthro- is not a Stata command, it is an -egen- function, and it is from Stata Journal, not SSC. Sorry about that.

Last edited by Clyde Schechter; 05 Apr 2016, 09:53.
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#5

05 Apr 2016, 13:15

Clyde Schechter thanks for the clarification on that topic, which is not my field. In that case Jin Russell should skip the egen part in the code in #3.
Comment
Jin Russell

Join Date: Jan 2016

Posts: 15
#6

05 Apr 2016, 15:30

Oded Mcdossi Clyde Schechter Thanks for these very helpful comments, all taken on board. Will try again using your suggestions.
Comment

Announcement

How to impute missing values for a variable that is subsequently used to create an outcome index?

Comment

Comment

Comment

Comment

Comment