Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to impute missing values for a variable that is subsequently used to create an outcome index?

    I am struggling to see how to obtain a pooled estimate for a variable after multiple imputation that can be worked with in further data manipulation for the original dataset.

    I am using a large child health cohort dataset (5737 cases in total) to compile an index of chronic conditions. The chronic condition index is calculated for each of the children by summing the total number of chronic conditions that each child has - for instance, 1 point for asthma, 1 point for eczema, so on. One of the variables that I would like to include in the index is BMI-for-age expressed as a Z-score. This variable, "ZBMI", is missing for 20% of my cases. The other variables that I am using in the index have complete cases.

    I would like to use multiple imputation to fill in the missing values for BMI, and then to go on to work with a pooled BMI variable. For instance, I would like to use the pooled BMI variable to then create a dichotomous indicator for obesity (each child will be 'obese' or 'not obese') and then to ultimately incorporate this marker of obesity into the chronic condition index.

    I can't see how to obtain an imputed BMI variable after imputing that I can then use for my n = 5737 dataset.
    I can see how regress immediately after imputing using the command "mi estimate:" but this command doesn't seem to allow me to manipulate the data in the way that I would like to:


    When I do this:

    mi impute mvn ZBMI_Y2CO = asthmacc_y2cm eczemacc_y2cm sepi2_am male_pdl, add(20) rseed(1234)

    gen obese = ZBMI_Y2CO > 2

    tab obese

    I get this output:

    obese | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 19,447 66.47 66.47
    1 | 9,810 33.53 100.00
    ------------+-----------------------------------
    Total | 29,257 100.00


    But I don't want a total of 29,257; I want to get back to n = 5737 and have the pooled variable to work with in my original dataset.

    I would appreciate any advice on this pickle!

    Warmly

    Jin
    Last edited by Jin Russell; 04 Apr 2016, 22:41.

  • #2
    I am struggling to see how to obtain a pooled estimate for a variable after multiple imputation that can be worked with in further data manipulation for the original dataset.
    [...]
    I would like to use multiple imputation to fill in the missing values for BMI, and then to go on to work with a pooled BMI variable. For instance, I would like to use the pooled BMI variable to then create a dichotomous indicator for obesity (each child will be 'obese' or 'not obese') and then to ultimately incorporate this marker of obesity into the chronic condition index.
    Why would you want this? The hole point of multiple imputation is to have multiple imputed values, not one. By using one value you do not reflect the uncertainty that is associated with this estimate. Instead you treat this value as if you had observed it.

    If you really want this, my advice is to take any of the imputed values as each will be as good as any of the others. There is no point in pooling anything here.

    I cannot say more, as I have no idea what you want to do with those indices once you have created them. If you plan on running analysis where statistical inference plays a role, then the latter will be "wrong" when using imputed values just as if they had been observed.

    Best
    Daniel

    Comment


    • #3
      Dear Jin, I strongly recommend to read the Stata PDF manual on MI. From your post it is unclear whether BMI, for example, is already a construct index (based on weight and height) or a given variable. If you've created this variable then maybe you should impute each of the original parts of this variable before. If this is a unique variable which you haven't compute by yourself then you should follow the mi passive help. The standardization part (the Z score) is already a manipulation so I, personally, will impute the bmi first and then standardize it. To tabulate each dataset you can use the mi xeq condition or remember always to tabulate with two-way table with the _mi_m variable as the first variable.
      ​see this related example:

      Code:
      //use some data and register the imputed variables
      webuse mheart5
      mi set flong
      mi register imputed age bmi
      set seed 29390
      mi impute mvn age bmi = attack smokes hsgrad female, add(10)
      //generate the Z-score on every imputed data
      mi xeq: egen  ZBMI= std(bmi)
      //gen the obese variable
      mi passive: gen obese = ZBMI > 2
      //Tabulate your results for each of the mi datasets
      mi xeq: tabulate obese
      //OR
      tab _mi_m obese

      Comment


      • #4
        Oded Mcdossi I can't say for sure what Jin Russell's zbmi variable is, but it is pretty standard practice in children's health data to use a z-score for body mass index that is standardized not with respect to the particular data set you are working with but to a standard reference population. There are both national and international standards compiled for these purposes, and my guess is that his variable is based on one of those. So using -egen- on his own data probably will not get him the right construct. There are also several user-written Stata programs available to calculate these from height weight and age data. Two that I have personally used are -zanthro- (available from SSC) and -igrowup- which can be downloaded from the World Health Organization's website. (I like -zanthro- better, FWIW.)

        Correction added: -zanthro- is not a Stata command, it is an -egen- function, and it is from Stata Journal, not SSC. Sorry about that.
        Last edited by Clyde Schechter; 05 Apr 2016, 09:53.

        Comment


        • #5
          Clyde Schechter thanks for the clarification on that topic, which is not my field. In that case Jin Russell should skip the egen part in the code in #3.

          Comment


          • #6
            Oded Mcdossi Clyde Schechter Thanks for these very helpful comments, all taken on board. Will try again using your suggestions.

            Comment

            Working...
            X