Hello,
I am attempting to conduct multiple imputation on a longitudinal dataset and am having difficulty getting this to work with my data. I have multiple observations per individual (1-12 cycles), but not every person has the same number of observations. I understand switching to wide format is required to use data from other cycles to inform imputation of a current cycle, which I definitely want to do here (for example, BMI recorded at cycle 2 may be important to fill in BMI at cycle 1 if missing).
However if I reshape my data by cycle, every individual is assigned a variable for 12 total cycles, even if they only have 2 cycles total. So, all the variables for that individual for cycles 3-12 would then be missing. When I try to impute using mi impute chained, I receive the error message: "mi impute: VCE is not positive definite", which I am guessing is due to so many variables with complete missingness. I do not care to impute cycles in which a person does not have observations, as I would not analyze these cycles anyhow and I don't want the extra missingness to influence the imputations. I am testing code now in a subset of my data, but will have a total of ~1million participants in the final dataset.
My question is: Is there a way to perform MI on this dataset type without running into the issue of creating so many missing variables? Is there a way to only impute based on the max # of cycles an individual should have?
Here is a short mock dataset in long form:
-In this example, ID 7 has 5 follow up cycles. When I use this code to switch to wide format:
reshape wide age bmi drinks parity age_fb ya_bmi event, i(id) j(cycle)
-So now all ids with <5 follow up cycles will have variables age5, bmi5, drinks5, parity5, age_fb5, ya_bmi5 and event5==.
-Here is code I am using for the MI (imputation is done within 'group' variable):
mi set wide
mi register imputed bmi1 bmi2 bmi3 bmi4 bmi5 drinks1 drinks2 drinks3 drinks4 drinks5 parity1 parity2 parity3 parity4 parity5 age_fb1 age_fb2 age_fb3 age_fb4 age_fb5 ya_bmi1 ya_bmi2 ya_bmi3 ya_bmi4 ya_bmi5
mi register regular age1 age2 age3 age4 age5 event1 event2 event3 event4 event5
mi impute chained (pmm,knn(1)) bmi1 bmi2 bmi3 bmi4 bmi5 drinks1 drinks2 drinks3 drinks4 drinks5 parity1 parity2 parity3 parity4 parity5 age_fb1 age_fb2 age_fb3 age_fb4 age_fb5 ya_bmi1 ya_bmi2 ya_bmi3 ya_bmi4 ya_bmi5, by(group) add(5)
And the error:
" group = 1
Performing chained iterations ...
mi impute: VCE is not positive definite
"
Thank you so much for your help!
Best,
Kristen
I am attempting to conduct multiple imputation on a longitudinal dataset and am having difficulty getting this to work with my data. I have multiple observations per individual (1-12 cycles), but not every person has the same number of observations. I understand switching to wide format is required to use data from other cycles to inform imputation of a current cycle, which I definitely want to do here (for example, BMI recorded at cycle 2 may be important to fill in BMI at cycle 1 if missing).
However if I reshape my data by cycle, every individual is assigned a variable for 12 total cycles, even if they only have 2 cycles total. So, all the variables for that individual for cycles 3-12 would then be missing. When I try to impute using mi impute chained, I receive the error message: "mi impute: VCE is not positive definite", which I am guessing is due to so many variables with complete missingness. I do not care to impute cycles in which a person does not have observations, as I would not analyze these cycles anyhow and I don't want the extra missingness to influence the imputations. I am testing code now in a subset of my data, but will have a total of ~1million participants in the final dataset.
My question is: Is there a way to perform MI on this dataset type without running into the issue of creating so many missing variables? Is there a way to only impute based on the max # of cycles an individual should have?
Here is a short mock dataset in long form:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(group id cycle age bmi drinks parity age_fb ya_bmi event) 1 1 1 30 25 4 0 0 . 0 1 1 2 32 . 5 1 31 . 0 1 2 1 25 19 0 0 0 19 0 1 2 2 29 21 0 0 0 19 1 1 3 1 41 26 . 2 35 20 0 1 3 2 46 25 . 2 35 20 0 1 4 1 44 31 3 5 21 25 0 1 5 1 39 . 12 3 32 24 1 2 6 1 30 22 13 0 0 18 0 2 6 2 31 22 12 0 0 18 0 2 6 3 33 21 0 1 31 18 1 2 7 1 22 24 10 0 0 . 0 2 7 2 29 26 4 0 0 . 0 2 7 3 35 . 3 2 28 . 0 2 7 4 37 . 3 3 28 . 0 2 7 5 42 . 3 3 28 . 0 2 8 1 25 20 . 2 . 23 0 2 8 2 29 23 0 3 . 23 0 2 8 3 34 27 0 4 . 23 0 end
reshape wide age bmi drinks parity age_fb ya_bmi event, i(id) j(cycle)
-So now all ids with <5 follow up cycles will have variables age5, bmi5, drinks5, parity5, age_fb5, ya_bmi5 and event5==.
-Here is code I am using for the MI (imputation is done within 'group' variable):
mi set wide
mi register imputed bmi1 bmi2 bmi3 bmi4 bmi5 drinks1 drinks2 drinks3 drinks4 drinks5 parity1 parity2 parity3 parity4 parity5 age_fb1 age_fb2 age_fb3 age_fb4 age_fb5 ya_bmi1 ya_bmi2 ya_bmi3 ya_bmi4 ya_bmi5
mi register regular age1 age2 age3 age4 age5 event1 event2 event3 event4 event5
mi impute chained (pmm,knn(1)) bmi1 bmi2 bmi3 bmi4 bmi5 drinks1 drinks2 drinks3 drinks4 drinks5 parity1 parity2 parity3 parity4 parity5 age_fb1 age_fb2 age_fb3 age_fb4 age_fb5 ya_bmi1 ya_bmi2 ya_bmi3 ya_bmi4 ya_bmi5, by(group) add(5)
And the error:
" group = 1
Performing chained iterations ...
mi impute: VCE is not positive definite
"
Thank you so much for your help!
Best,
Kristen
Comment