I am struggling to see how to obtain a pooled estimate for a variable after multiple imputation that can be worked with in further data manipulation for the original dataset.
I am using a large child health cohort dataset (5737 cases in total) to compile an index of chronic conditions. The chronic condition index is calculated for each of the children by summing the total number of chronic conditions that each child has - for instance, 1 point for asthma, 1 point for eczema, so on. One of the variables that I would like to include in the index is BMI-for-age expressed as a Z-score. This variable, "ZBMI", is missing for 20% of my cases. The other variables that I am using in the index have complete cases.
I would like to use multiple imputation to fill in the missing values for BMI, and then to go on to work with a pooled BMI variable. For instance, I would like to use the pooled BMI variable to then create a dichotomous indicator for obesity (each child will be 'obese' or 'not obese') and then to ultimately incorporate this marker of obesity into the chronic condition index.
I can't see how to obtain an imputed BMI variable after imputing that I can then use for my n = 5737 dataset.
I can see how regress immediately after imputing using the command "mi estimate:" but this command doesn't seem to allow me to manipulate the data in the way that I would like to:
When I do this:
mi impute mvn ZBMI_Y2CO = asthmacc_y2cm eczemacc_y2cm sepi2_am male_pdl, add(20) rseed(1234)
gen obese = ZBMI_Y2CO > 2
tab obese
I get this output:
obese | Freq. Percent Cum.
------------+-----------------------------------
0 | 19,447 66.47 66.47
1 | 9,810 33.53 100.00
------------+-----------------------------------
Total | 29,257 100.00
But I don't want a total of 29,257; I want to get back to n = 5737 and have the pooled variable to work with in my original dataset.
I would appreciate any advice on this pickle!
Warmly
Jin
I am using a large child health cohort dataset (5737 cases in total) to compile an index of chronic conditions. The chronic condition index is calculated for each of the children by summing the total number of chronic conditions that each child has - for instance, 1 point for asthma, 1 point for eczema, so on. One of the variables that I would like to include in the index is BMI-for-age expressed as a Z-score. This variable, "ZBMI", is missing for 20% of my cases. The other variables that I am using in the index have complete cases.
I would like to use multiple imputation to fill in the missing values for BMI, and then to go on to work with a pooled BMI variable. For instance, I would like to use the pooled BMI variable to then create a dichotomous indicator for obesity (each child will be 'obese' or 'not obese') and then to ultimately incorporate this marker of obesity into the chronic condition index.
I can't see how to obtain an imputed BMI variable after imputing that I can then use for my n = 5737 dataset.
I can see how regress immediately after imputing using the command "mi estimate:" but this command doesn't seem to allow me to manipulate the data in the way that I would like to:
When I do this:
mi impute mvn ZBMI_Y2CO = asthmacc_y2cm eczemacc_y2cm sepi2_am male_pdl, add(20) rseed(1234)
gen obese = ZBMI_Y2CO > 2
tab obese
I get this output:
obese | Freq. Percent Cum.
------------+-----------------------------------
0 | 19,447 66.47 66.47
1 | 9,810 33.53 100.00
------------+-----------------------------------
Total | 29,257 100.00
But I don't want a total of 29,257; I want to get back to n = 5737 and have the pooled variable to work with in my original dataset.
I would appreciate any advice on this pickle!
Warmly
Jin
Comment