Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining Multiply Imputed Datasets into One Dataset

    Hello All,

    I'm a phd candidate that is new to stata and multiple imputation and I have two questions:

    I want to impute missings on a large survey dataset that I then want to use for a more descriptive application in GIS. A google search hasn't turned up anything on combining imputations into a single dataset, presumably because any estimation that you'd want to do with the imputed dataset can be done with mi estimate. However, GIS can't estimate these imputed datasets and my next course of action is then to combine them. My questions then are: 1) is there a process for combining imputed datasets into one set (e.g. collapsing the imputed indicators into an averaged dataset[M1 + M2 + M3 + M4 + M5] / 5, or some other appropriate method), and 2) how would you do that in stata?


  • #2
    See -help mi styles-, -help mi set- and -help mi convert-. It sounds like you want to use the flong (Full Long) style.

    I suppose if you really want an "average" set of values, i.e.only one record per case, you could play around with the -collapse- command.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Collapsing data across three dimensions using multiply imputed datasets.docx Thanks for the post Richard,

      I've attached a visual schematic of what I want to try to accomplish. Visuals always help me better conceptualize. Is this an appropriate way of combining imputed estimates?

      I've tried playing around with both the style and the collapse command. I think flong is what I need to use but I don't think collapse in stata is equipped to collapse in 3 dimensions. So far, I haven't found any literature that tries to do so. I've also tried to do this with the mi xeq command but collapse won't work with it.

      Any help would be greatly appreciated!

      Ryan

      Comment


      • #4
        Collapse _example.docx

        Here's my last attempt at beating this to death. I've constructed an example using a practice dataset, applying imputation, and the outcome that I hope to achieve. Not sure how to do this using collapse command, all of my attempts have been unfruitful.

        Thanks

        Ryan

        Comment


        • #5
          Why don't you want to use the flong data set? Also, if you have any categorical variables, e.g. gender, taking the mean isn't going to work.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            I think you misunderstood, I do want to use flong. I thought I said I was using it in my previous posts. In regards to categorical variables, there are some rules in the literature that I've found for rounding binary, ordinal, and categorical variables.

            Ryan

            Comment


            • #7
              But, if you really really really want to do it this way (the first several commands are yours, the new commands are at the end)...

              Code:
              . input a b
              
                           a          b
                1. 1 2
                2. 4 .
                3. . 2
                4. 5 1
                5. 0 .
                6. end
              
              .  
              . list
              
                   +-------+
                   | a   b |
                   |-------|
                1. | 1   2 |
                2. | 4   . |
                3. | .   2 |
                4. | 5   1 |
                5. | 0   . |
                   +-------+
              
              . mi set flong
              
              .         
              . mi register imputed a b
              (3 m=0 obs. now marked as incomplete)
              
              .         
              . mi impute chain (regress) a b, add(5) rseed(1234)
              
              Conditional models:
                               a: regress a b
                               b: regress b a
              
              Performing chained iterations ...
              
              Multivariate imputation                     Imputations =        5
              Chained equations                                 added =        5
              Imputed: m=1 through m=5                        updated =        0
              
              Initialization: monotone                     Iterations =       50
                                                              burn-in =       10
              
                               a: linear regression
                               b: linear regression
              
              ------------------------------------------------------------------
                                 |               Observations per m             
                                 |----------------------------------------------
                        Variable |   Complete   Incomplete   Imputed |     Total
              -------------------+-----------------------------------+----------
                               a |          4            1         1 |         5
                               b |          3            2         2 |         5
              ------------------------------------------------------------------
              (complete + incomplete = total; imputed is the minimum across m
               of the number of filled-in observations.)
              
              . list
              
                   +---------------------------------------------------+
                   |         a           b   _mi_id   _mi_miss   _mi_m |
                   |---------------------------------------------------|
                1. |         1           2        1          0       0 |
                2. |         4           .        2          1       0 |
                3. |         .           2        3          1       0 |
                4. |         5           1        4          0       0 |
                5. |         0           .        5          1       0 |
                   |---------------------------------------------------|
                6. |         1           2        1          .       1 |
                7. |         4   -.3088514        2          .       1 |
                8. |  5.078469           2        3          .       1 |
                9. |         5           1        4          .       1 |
               10. |         0    8.478017        5          .       1 |
                   |---------------------------------------------------|
               11. |         1           2        1          .       2 |
               12. |         4    3.034929        2          .       2 |
               13. |  4.022594           2        3          .       2 |
               14. |         5           1        4          .       2 |
               15. |         0    2.740744        5          .       2 |
                   |---------------------------------------------------|
               16. |         1           2        1          .       3 |
               17. |         4    2.416472        2          .       3 |
               18. | -1.409533           2        3          .       3 |
               19. |         5           1        4          .       3 |
               20. |         0    3.483081        5          .       3 |
                   |---------------------------------------------------|
               21. |         1           2        1          .       4 |
               22. |         4    .8368034        2          .       4 |
               23. | -1.494785           2        3          .       4 |
               24. |         5           1        4          .       4 |
               25. |         0    1.471483        5          .       4 |
                   |---------------------------------------------------|
               26. |         1           2        1          .       5 |
               27. |         4    1.259679        2          .       5 |
               28. |  1.585542           2        3          .       5 |
               29. |         5           1        4          .       5 |
               30. |         0    2.132673        5          .       5 |
                   +---------------------------------------------------+
              
              . keep if _mi_m > 0
              (5 observations deleted)
              
              . collapse a b, by(_mi_id)
              
              . list
              
                   +------------------------------+
                   | _mi_id          a          b |
                   |------------------------------|
                1. |      1          1          2 |
                2. |      2          4   1.447806 |
                3. |      3   1.556457          2 |
                4. |      4          5          1 |
                5. |      5          0     3.6612 |
                   +------------------------------+
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                I feel like I have just shown you how to do evil though. Again, why not just use the flong data?
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Wow. That did it. Thank you!
                  I'm not sure what you mean regarding flong. I did mi set it to the flong style type. Do you mean something different?

                  Comment


                  • #10
                    Why do you want to compute averages? Why not just use the 5 imputed data sets? In other words, use all the data before the collapse.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      I'm adapting my data for a Geographical Information Systems project. ArcMap, the GIS software program wouldn't know what to make of the imputed datasets, especially in flong form. Wide form would be more appropriate for GIS, but even still, the software doesn't have the ability to do estimations across the imputed datasets. My solution is to collapse into a single dataset. I'm not using the data for inference with the GIS project, really more for Factor Analysis and then descriptive work in a geographic context. Does that answer your question?

                      Comment


                      • #12
                        I don't work with GIS software so I will trust your judgment on it. I am leery of just using the means -- single imputation is inferior to multiple imputation -- but maybe for your purposes this is good enough or even the best approach possible. Indeed, maybe the old -impute- command would have been a simpler and adequate enough solution. Good luck with this.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          I appreciate it. Thanks again for your help Richard.

                          Comment


                          • #14
                            I am doing time series analysis and I have have problems running time series commands on the imputed data. I was therefore wondering if using averages of the 7 imputed data sets is a good idea. In the running the time series commands, I use:

                            mi estimate, cmdok: dfuller X, lags(3). I get an error;
                            macro e(cmd) is not set
                            matrix e(b) is not set
                            matrix e(V) is not set
                            r(301);

                            Comment

                            Working...
                            X