Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive statistics after multiple imputation

    Hello Statalisters,

    I'm trying to obtain descriptive statistics for variables in an imputed dataset (100 imputations, using ice in STATA13). The "mi xeq:" command allows obtaining summary stats in each of the imputed datasets separately, but does not provide a pooled estimate from all datasets. I read in a previous post that the command "misum" provides an average from all imputed datasets. Are there other commands that could be used to obtain proportions (e.g. tab)? to examine differences between groups (e.g., using ttest or chi)?

    I'd appreciate your help with this.
    Thanks,
    Fatima
    Last edited by Fatima Al Sayah; 20 Apr 2015, 13:42.

  • #2
    Hello Statalisters,

    I'm trying to obtain descriptive statistics for variables in an imputed dataset (100 imputations, using ice in STATA13). The "mi xeq:" command allows obtaining summary stats in each of the imputed datasets separately, but does not provide a pooled estimate from all datasets. Is there a way to do that?

    I'd appreciate your help with this.
    Thanks,
    Fatima

    Comment


    • #3
      As the author of misum, four years after writing it down, I seriously doubt its results are useful.

      For one thing, combining quantities according to Rubins rules, we assume asymptotic normality, which is probably not given for quantiles, standard deviations, etc. The mean is normal, but then the mean reported by misum equals the mean reported by summarize.

      More important, multiple imputation was not designed for descriptive statistics. Actually, it is not even designed to obtain correct point estimates - although it can do this. We impute more than one value to get the standard errors right, thus multiple imputation is designed for inference. So if you are interested in comparing proportions (in a population) you should use mi estimate with proportion and not even try to get tabulate to work with mi.

      I hope this helps.

      Daniel
      Last edited by daniel klein; 20 Apr 2015, 14:16.

      Comment


      • #4
        Hello Daniel,
        I found this post very interesting as I currently deal with a similar problem: I have imputed data and that looks fine, also the regressions I ran with it. Now I was told to do descriptive statistics (normally I would use: "tab var1 var2, col"). The problem is: If I run the descriptives, I get a different N than with the imupted regression. When I run the regression with the imputed data, I have an N of 10.220. When I do the descriptives, I get an N of 10.452. Now my supervisor says that this is a problem, as descriptives and regression should have the same N. But I wonder if that is even possible with imputed data, as I can't use the "mi"-command with tab, as you mentioned.

        I am grateful for everything you can tell me about this.

        Thanks,
        Anne.

        Comment


        • #5
          Kathrin, I think what your supervisor is trying to say is that the descriptive statistics should be based on the same sample as the multivariate analysis. The number of observations is an indicator that this is the case (even though you could end up with the same N but different cases). Anyway, as I have stated above, I do not believe that multiple imputation was meant for descriptives. I would tend to report the descriptive statistics for the original sample, including the proportion of missing values per variable. From my perspective, this actualy represents the same sample that is used for multivariate analysis.

          You may, additionally, want to check whether the structure in the original data is preserved during the imputation. That is, in a way, another kind of descriptive result. For approaches check out Eddings and Marchenko (2012).

          As a side note: I find it a bit irritating that your sample size after imputation is actually smaller than before. Is this because you have a lot of missing values on the outcome/dependent variable? You would usually use the respective cases during imputation and delete them afterwards. In this case one could argue that the descriptives should be reported for all cases that do not have missing values on the outcome.

          A last, sometimes frustrating, point to consider is the general rule that the supervisor is always right; even they are not.

          Hope this helps.

          Best
          Daniel


          Eddings, W., Marchenko, Y. 2012. Diagnostics for multiple imputation in Stata. The Stata Journal, 12(3), pp. 353-367.
          Last edited by daniel klein; 27 Jun 2018, 05:36. Reason: added full reference

          Comment

          Working...
          X