Combining Multiply Imputed Datasets into One Dataset

RKrone

Join Date: Jun 2014

Posts: 7
#1

Combining Multiply Imputed Datasets into One Dataset

01 Jun 2014, 17:01

Hello All,

I'm a phd candidate that is new to stata and multiple imputation and I have two questions:

I want to impute missings on a large survey dataset that I then want to use for a more descriptive application in GIS. A google search hasn't turned up anything on combining imputations into a single dataset, presumably because any estimation that you'd want to do with the imputed dataset can be done with mi estimate. However, GIS can't estimate these imputed datasets and my next course of action is then to combine them. My questions then are: 1) is there a process for combining imputed datasets into one set (e.g. collapsing the imputed indicators into an averaged dataset[M1 + M2 + M3 + M4 + M5] / 5, or some other appropriate method), and 2) how would you do that in stata?
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4946
#2

01 Jun 2014, 17:53

See -help mi styles-, -help mi set- and -help mi convert-. It sounds like you want to use the flong (Full Long) style.

I suppose if you really want an "average" set of values, i.e.only one record per case, you could play around with the -collapse- command.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
RKrone

Join Date: Jun 2014

Posts: 7
#3

02 Jun 2014, 14:43

Collapsing data across three dimensions using multiply imputed datasets.docx Thanks for the post Richard,

I've attached a visual schematic of what I want to try to accomplish. Visuals always help me better conceptualize. Is this an appropriate way of combining imputed estimates?

I've tried playing around with both the style and the collapse command. I think flong is what I need to use but I don't think collapse in stata is equipped to collapse in 3 dimensions. So far, I haven't found any literature that tries to do so. I've also tried to do this with the mi xeq command but collapse won't work with it.

Any help would be greatly appreciated!

Ryan
Comment
RKrone

Join Date: Jun 2014

Posts: 7
#4

02 Jun 2014, 16:40

Collapse _example.docx

Here's my last attempt at beating this to death. I've constructed an example using a practice dataset, applying imputation, and the outcome that I hope to achieve. Not sure how to do this using collapse command, all of my attempts have been unfruitful.

Thanks

Ryan
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4946
#5

02 Jun 2014, 16:47

Why don't you want to use the flong data set? Also, if you have any categorical variables, e.g. gender, taking the mean isn't going to work.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
RKrone

Join Date: Jun 2014

Posts: 7
#6

02 Jun 2014, 16:53

I think you misunderstood, I do want to use flong. I thought I said I was using it in my previous posts. In regards to categorical variables, there are some rules in the literature that I've found for rounding binary, ordinal, and categorical variables.

Ryan
Comment

Richard Williams

Join Date: Apr 2014
Posts: 4946

02 Jun 2014, 16:57

But, if you really really really want to do it this way (the first several commands are yours, the new commands are at the end)...

Code:

. input a b

             a          b
  1. 1 2
  2. 4 .
  3. . 2
  4. 5 1
  5. 0 .
  6. end

.  
. list

     +-------+
     | a   b |
     |-------|
  1. | 1   2 |
  2. | 4   . |
  3. | .   2 |
  4. | 5   1 |
  5. | 0   . |
     +-------+

. mi set flong

.         
. mi register imputed a b
(3 m=0 obs. now marked as incomplete)

.         
. mi impute chain (regress) a b, add(5) rseed(1234)

Conditional models:
                 a: regress a b
                 b: regress b a

Performing chained iterations ...

Multivariate imputation                     Imputations =        5
Chained equations                                 added =        5
Imputed: m=1 through m=5                        updated =        0

Initialization: monotone                     Iterations =       50
                                                burn-in =       10

                 a: linear regression
                 b: linear regression

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
                 a |          4            1         1 |         5
                 b |          3            2         2 |         5
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

. list

     +---------------------------------------------------+
     |         a           b   _mi_id   _mi_miss   _mi_m |
     |---------------------------------------------------|
  1. |         1           2        1          0       0 |
  2. |         4           .        2          1       0 |
  3. |         .           2        3          1       0 |
  4. |         5           1        4          0       0 |
  5. |         0           .        5          1       0 |
     |---------------------------------------------------|
  6. |         1           2        1          .       1 |
  7. |         4   -.3088514        2          .       1 |
  8. |  5.078469           2        3          .       1 |
  9. |         5           1        4          .       1 |
 10. |         0    8.478017        5          .       1 |
     |---------------------------------------------------|
 11. |         1           2        1          .       2 |
 12. |         4    3.034929        2          .       2 |
 13. |  4.022594           2        3          .       2 |
 14. |         5           1        4          .       2 |
 15. |         0    2.740744        5          .       2 |
     |---------------------------------------------------|
 16. |         1           2        1          .       3 |
 17. |         4    2.416472        2          .       3 |
 18. | -1.409533           2        3          .       3 |
 19. |         5           1        4          .       3 |
 20. |         0    3.483081        5          .       3 |
     |---------------------------------------------------|
 21. |         1           2        1          .       4 |
 22. |         4    .8368034        2          .       4 |
 23. | -1.494785           2        3          .       4 |
 24. |         5           1        4          .       4 |
 25. |         0    1.471483        5          .       4 |
     |---------------------------------------------------|
 26. |         1           2        1          .       5 |
 27. |         4    1.259679        2          .       5 |
 28. |  1.585542           2        3          .       5 |
 29. |         5           1        4          .       5 |
 30. |         0    2.132673        5          .       5 |
     +---------------------------------------------------+

. keep if _mi_m > 0
(5 observations deleted)

. collapse a b, by(_mi_id)

. list

     +------------------------------+
     | _mi_id          a          b |
     |------------------------------|
  1. |      1          1          2 |
  2. |      2          4   1.447806 |
  3. |      3   1.556457          2 |
  4. |      4          5          1 |
  5. |      5          0     3.6612 |
     +------------------------------+

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Richard Williams

Join Date: Apr 2014

Posts: 4946
#8

02 Jun 2014, 16:58

I feel like I have just shown you how to do evil though. Again, why not just use the flong data?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
RKrone

Join Date: Jun 2014

Posts: 7
#9

02 Jun 2014, 17:12

Wow. That did it. Thank you!
I'm not sure what you mean regarding flong. I did mi set it to the flong style type. Do you mean something different?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4946
#10

02 Jun 2014, 17:35

Why do you want to compute averages? Why not just use the 5 imputed data sets? In other words, use all the data before the collapse.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
RKrone

Join Date: Jun 2014

Posts: 7
#11

02 Jun 2014, 20:33

I'm adapting my data for a Geographical Information Systems project. ArcMap, the GIS software program wouldn't know what to make of the imputed datasets, especially in flong form. Wide form would be more appropriate for GIS, but even still, the software doesn't have the ability to do estimations across the imputed datasets. My solution is to collapse into a single dataset. I'm not using the data for inference with the GIS project, really more for Factor Analysis and then descriptive work in a geographic context. Does that answer your question?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4946
#12

02 Jun 2014, 20:43

I don't work with GIS software so I will trust your judgment on it. I am leery of just using the means -- single imputation is inferior to multiple imputation -- but maybe for your purposes this is good enough or even the best approach possible. Indeed, maybe the old -impute- command would have been a simpler and adequate enough solution. Good luck with this.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
RKrone

Join Date: Jun 2014

Posts: 7
#13

02 Jun 2014, 20:52

I appreciate it. Thanks again for your help Richard.
Comment
Julius Manda

Join Date: Feb 2015

Posts: 1
#14

17 May 2016, 07:23

I am doing time series analysis and I have have problems running time series commands on the imputed data. I was therefore wondering if using averages of the 7 imputed data sets is a good idea. In the running the time series commands, I use:

mi estimate, cmdok: dfuller X, lags(3). I get an error;
macro e(cmd) is not set
matrix e(b) is not set
matrix e(V) is not set
r(301);
Comment

Announcement

Combining Multiply Imputed Datasets into One Dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment