Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation with survey questions grouped by scales

    I am using STATA/IC 13.1 for a social sciences application. I have survey data with 100 questions and responses for a sample size of n = 477. There is missing data that ranges from 0-22% for each of the survey questions.

    I have imputed each of the 100 questions and now have 5 imputed data sets.

    As a first step in the analysis, the researcher would like to use one of the survey scales (consisting of 9 questions = Emotional Exhaustion) as the dependent variable for regression with some of the other variables (gender, race, education level, etc).

    So the dependent variable is the mean of 9 survey questions.

    I know that 5 imputed data sets for the independent variables are ready for the mi estimate (regression). But how do I obtain a mean of the 9 survey questions for the dependent variable in the model?

    I am not sure if I should do this:

    I could find the mean for each of the 9 variables for each of the 5 imputations, so I would have a mean for emotional exhaustion for each of the 5 imputations. Would Stata let me register these as an imputed variable? EmotionalExhaustion?
    -or-

    I would take the mean manually and mi register this as a regular variable for the mi estimate?

    I apologize if this has been handled in documentation previously but I could not find a reference, or possibly did not frame the question well.

    Any help is appreciated.

  • #2
    What you are groping for here is -mi passive-. That is when a variable is calculated as a deterministic function of the imputed variables. Read -help mi passive- for details.

    That said, I believe that this is not the recommended approach. My understanding is that you should have imputed the scale scores as well as the items back when you did the multiple imputations. Then use the imputed scale scores in analysis, even though the imputed scale scores are not equal to the means of the 9 imputed item scores. It is important to remember that the purpose of multiple imputation is to provide adequate variation in the values of the imputed variables. It is neither necessary, nor even desirable, that the imputed data sets be realistic, nor resemble the real thing. In fact, the imputed values of the variables do not even have to be possible values of the real variables. Multiple imputation is about using data that, when fed into Rubin's rules, will produce regression estimates that are not biased due to underestimating the variance of the predictor variables. The mathematical theory underlying it makes no assumptions that imply verisimilitude of the imputed values.

    Comment


    • #3
      This makes perfect sense. I can create the scale scores before the imputation and then impute. I did not like taking care of the scales afterward because I was worried about reduction of the variance and should have thought of this. Thank you.

      Comment


      • #4
        There is a brief discussion of passive imputation on pp. 10-11 of

        https://www3.nd.edu/~rwilliam/xsoc73994/MD02.pdf

        In general, Allison and others argue against passive imputation. They prefer the "Just another variable" approach for things like interactions and squared terms. There is at least one exception. Suppose you are trying to compute a scale that is the sum of several items. In an email to me, Allison said “It's better, when possible, to impute at the item level rather than the scale level. Otherwise you lose a lot of data. This is one case where JAV doesn't apply.”

        As a sidelight, 5 imputed data sets is pretty wimpy. Most sources today recommend much more than that.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 18.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Thank you for your responses. One more question. When I insert the scaled variable into the mi impute command along withall the individual survey items, I get a error:

          "vce is not positive definite the posterior distribution from which mi impute drew the imputations for gender is not proper when the vce estimated from the observed data is not positive definite. this may happen, for example, when the number of parameters exceeds the number of observations. choose an alternate imputation model."

          I assume this is the relationship between the mean scaled item and each of the variables that contributed to it. I am able to overcome by omitting one of the individual variables from the imputation, but all in causes this error.

          Comment

          Working...
          X