Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation (chained) using longitudinal data with unequal cycles by participant

    Hello,

    I am attempting to conduct multiple imputation on a longitudinal dataset and am having difficulty getting this to work with my data. I have multiple observations per individual (1-12 cycles), but not every person has the same number of observations. I understand switching to wide format is required to use data from other cycles to inform imputation of a current cycle, which I definitely want to do here (for example, BMI recorded at cycle 2 may be important to fill in BMI at cycle 1 if missing).

    However if I reshape my data by cycle, every individual is assigned a variable for 12 total cycles, even if they only have 2 cycles total. So, all the variables for that individual for cycles 3-12 would then be missing. When I try to impute using mi impute chained, I receive the error message: "mi impute: VCE is not positive definite", which I am guessing is due to so many variables with complete missingness. I do not care to impute cycles in which a person does not have observations, as I would not analyze these cycles anyhow and I don't want the extra missingness to influence the imputations. I am testing code now in a subset of my data, but will have a total of ~1million participants in the final dataset.

    My question is: Is there a way to perform MI on this dataset type without running into the issue of creating so many missing variables? Is there a way to only impute based on the max # of cycles an individual should have?

    Here is a short mock dataset in long form:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(group id cycle age bmi drinks parity age_fb ya_bmi event)
    1 1 1 30 25  4 0  0  . 0
    1 1 2 32  .  5 1 31  . 0
    1 2 1 25 19  0 0  0 19 0
    1 2 2 29 21  0 0  0 19 1
    1 3 1 41 26  . 2 35 20 0
    1 3 2 46 25  . 2 35 20 0
    1 4 1 44 31  3 5 21 25 0
    1 5 1 39  . 12 3 32 24 1
    2 6 1 30 22 13 0  0 18 0
    2 6 2 31 22 12 0  0 18 0
    2 6 3 33 21  0 1 31 18 1
    2 7 1 22 24 10 0  0  . 0
    2 7 2 29 26  4 0  0  . 0
    2 7 3 35  .  3 2 28  . 0
    2 7 4 37  .  3 3 28  . 0
    2 7 5 42  .  3 3 28  . 0
    2 8 1 25 20  . 2  . 23 0
    2 8 2 29 23  0 3  . 23 0
    2 8 3 34 27  0 4  . 23 0
    end
    -In this example, ID 7 has 5 follow up cycles. When I use this code to switch to wide format:

    reshape wide age bmi drinks parity age_fb ya_bmi event, i(id) j(cycle)

    -So now all ids with <5 follow up cycles will have variables age5, bmi5, drinks5, parity5, age_fb5, ya_bmi5 and event5==.

    -Here is code I am using for the MI (imputation is done within 'group' variable):

    mi set wide
    mi register imputed bmi1 bmi2 bmi3 bmi4 bmi5 drinks1 drinks2 drinks3 drinks4 drinks5 parity1 parity2 parity3 parity4 parity5 age_fb1 age_fb2 age_fb3 age_fb4 age_fb5 ya_bmi1 ya_bmi2 ya_bmi3 ya_bmi4 ya_bmi5
    mi register regular age1 age2 age3 age4 age5 event1 event2 event3 event4 event5

    mi impute chained (pmm,knn(1)) bmi1 bmi2 bmi3 bmi4 bmi5 drinks1 drinks2 drinks3 drinks4 drinks5 parity1 parity2 parity3 parity4 parity5 age_fb1 age_fb2 age_fb3 age_fb4 age_fb5 ya_bmi1 ya_bmi2 ya_bmi3 ya_bmi4 ya_bmi5, by(group) add(5)


    And the error:
    " group = 1
    Performing chained iterations ...
    mi impute: VCE is not positive definite
    "

    Thank you so much for your help!

    Best,
    Kristen




  • #2
    Kristen, hi.

    If you have repeated measures (e.g., participants were measured multiple times over time), there is no need for multiple imputation. In fact, multiple imputation can even bias the results. Just go for mixed-effects models.

    See:

    https://www.sciencedirect.com/scienc...lQmuM3GfH8mxLr

    Comment


    • #3
      Hi Tiago,
      I appreciate the insight. The problem I pose is step one in an analysis that has been developed in depth, and unfortunately, the mixed effects models approach is not going to benefit us in this situation. Thanks for the response though, and I will consider this when using repeated measures data in the future.
      Kristen

      Comment


      • #4
        Perhaps I misunderstood your dataset, but mixed-effects modelling is the only approach to your analysis.

        Inputting missing data with repeated measures is way more complicated, since the matrices may not be positive definite. You will have to impute missing data assuming that observations are nested within subjects (multilevel imputation).

        As far as I remember, only REALCOM-IMPUTE does that http://www.bristol.ac.uk/cmm/softwar...mputation.html

        Comment


        • #5
          Thanks Tiago, I think I understand your response more now. From my research the only feasible option seems to be something akin to time raster imputation (http://pzs.dstu.dp.ua/DataMining/pre.../bibl/fimd.pdf) - is this essentially what you are recommending? Unfortunately I'm limited to conducting my analysis in STATA so if there isn't a way to set up data in this way within the software (which I believe is not possible...) then I'll have to think of something else. I appreciate your time!

          Comment

          Working...
          X