Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation - New Data

    Hi, I'm new to the forum and just wanted to ask a question. I'm using a data-set of five multiple imputations of time series data ending in 2002. Am I allowed to simply add the same new data (2002-2018) to the bottom of each imputation? I'm undergrad dissertation so would love some relevant literature to help me along. Cheers

  • #2
    Welcome to Statalist, Charlie! Be sure to read the FAQ, especially FAQ 12.
    It's not clear what you mean by the "same new data". If the new data also consists of five imputation data sets, then just append each to one of the old imputations.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Thanks Steve!

      Sorry I was unclear there. So the 'new data' I'm talking about is comprehensive (no missing values) data from 2002-2018 that therefore has no need to be subject to multiple imputation. I was wondering if I could simply append the the new data to the bottom of the 5 multiple imputations therefore making the second half of each of those imputations the exact same? Or would I have to run each multiple imputation again with the new data included?

      Best,

      Charlie

      Comment


      • #4

        Appending the new data to the old is still possible. You'll need to generate the MI system variables _mi_m ,_mi_id, and _mi_miss.
        Suppose N is the number of observations in the original data. In copy i of the new data:
        Code:
        gen _mi_id =  _n +N  
        gen _mi_miss = 0      // because there's no missing data
        gen _mi_m= i            //  imputation number _mi_m = 1   to _mi_m = 5
        Then append copy i of the new data to imputation set i of the old data.

        Good luck!
        Last edited by Steve Samuels; 20 Nov 2018, 12:05.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          While Steve points to the technical possibility of appending the fully observed datasets, I am not sure this will do from a statistical point of view. If the information in the fully observed data is not used in the imputation process, then there cannot be any relationship between the imputed values and the fully observed data.

          I have no clear idea about the data structure that you are dealing with nor about the imputation model that you are using. However, as I understand you have some sort of time-series data. Thus, using information from earlier and later points in time to impute missing values for the same observation seems a natural choice. If you do/did so, I would repeat the imputation using the information from the fully observed dataset.

          Best
          Daniel

          Comment


          • #6
            Doing over the imputations on the full data would also be my first choice. Another good reason besides the one Daniel mentions to increase the number of imputations: five is much too small. The Stata manual recommends 20 and the default in SAS is 25. A quick google search also turned up:

            https://statisticalhorizons.com/more-imputations​​​​​​

            ​https://www.ncbi.nlm.nih.gov/pubmed/17549635

            It's also a good idea to include design factors if it wasn't done originally. See Reiter, J. P., Raghunathan, T. E., & Kinney, S. K. (2006). The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology, 32(2), 143,

            :
            Last edited by Steve Samuels; 20 Nov 2018, 14:12.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              As a second thought, you may well decide to perform multiple imputation for the whole data (aka, scenario) having the "original" data up to 2002 and appending to the second data (without missing data).
              Best regards,

              Marcos

              Comment


              • #8
                Thank you, I think I will append the 'new comprehensive' data to the original old data and take new multiple imputations. Thank you all for the pointers.

                Best,

                Charlie

                Comment

                Working...
                X