Multiple Imputation by Chained Equations (MICE)

Jeff Tree

Join Date: May 2019

Posts: 33
#1

Multiple Imputation by Chained Equations (MICE)

30 May 2020, 12:42

Hi all,

I received a comment from an anonymous reviewer on my article that I may need to further evaluate listwise deletion with multiple imputation. The reviewer further indicated that a cursory MI examination would be satisfactory and addressing in a footnote would be sufficient for revision.

Although they didn't specify, I believe the reviewer is concerned that of my two dichotomous variables are missing a substantial amount of data (say x5 and x6).

This is new territory for me and I would appreciate feedback on my approach...

Code:

//Create random variables with missing data: clear set obs 1000 set seed 12345 gen y = runiformint(0,1) gen x1 = runiform() gen x2 = runiform(2, 4) gen x3 = runiform(0, 6) gen x4 = runiform() gen x5 = runiformint(0,1) gen x6 = runiformint(0,1) replace y = . if x2 > 3 replace x1 = . if x1 > 0.6 replace x4 = . if x2 < 2.5 replace x5 = . if x3 > 3 replace x6 = . if x4 > .6 //Multiple imputation: mi set mlong misstable summarize mi register imputed x5 x6 mi impute chained (logit) x5 x6 = y x1 x2 x3 x4, add(20) rseed(1234) force mi xeq 0 1 20: summarize x5 x6 mi estimate: logit y x1 x2 x3 x4 x5 x6

I suppose I have a few questions:
Do I need to still register other variables missing data even if not of interest, i.e. y, x1, x4?

Do I need to register 'regular' variables, i.e.

Code:

mi register regular x2 x3

What if I have a quadradic term in my original analytical model (i.e. x1^2)? Should I include it in my MI model as well?

Last edited by Jeff Tree; 30 May 2020, 12:50.
Tags: None
CEdward

Join Date: Nov 2014

Posts: 131
#2

30 May 2020, 12:52

This will be, coincidentally, my first contributing response because it just so happens that I am studying about multiple imputation right now.

I think you would register the variables that have missing data because for MICE as I understand it, the estimation of the missing values are conditional on the existing values of the dataset including newly imputed values. So for each iteration the previously imputed values go into the imputation of the next set of values, etc.

You only register the variables with missing.

The analytic model and the multiple imputation model should be consistent and equivalent for reasons I don't fully understand, but it does prevent bias downstream.

Out of curiosity - why are you using MICE as opposed to the multivariate normal distribution method?
Comment
Jeff Tree

Join Date: May 2019

Posts: 33
#3

30 May 2020, 13:01

Originally posted by Jack Chau View Post

This will be, coincidentally, my first contributing response because it just so happens that I am studying about multiple imputation right now.

Haha, you'll definitely be more familiar with MI than myself! I think the only time I learned about it was in a single lecture during grad school! :-)

I think you would register the variables that have missing data because for MICE as I understand it, the estimation of the missing values are conditional on the existing values of the dataset including newly imputed values. So for each iteration the previously imputed values go into the imputation of the next set of values, etc.

You only register the variables with missing.

The analytic model and the multiple imputation model should be consistent and equivalent for reasons I don't fully understand, but it does prevent bias downstream.

I suspected that was the case as well. I'm assuming the following would then be the correct protocol by adding y, x1, and x4:

Code:

mi register imputed y x1 x4 x5 x6 mi impute chained (logit) x5 x6 = y x1 x2 x3 x4, add(20) rseed(1234) force mi xeq 0 1 20: summarize x5 x6 mi estimate: logit y x1 x2 x3 x4 x5 x6

Out of curiosity - why are you using MICE as opposed to the multivariate normal distribution method?

Based on my very limited understanding, MICE is appropriate for binary dependent variables?
Comment

Jeff Tree

Join Date: May 2019
Posts: 33

30 May 2020, 13:10

Slight correction to the above:

Code:

mi register imputed y x1 x4 x5 x6 
mi impute chained (logit) y x5 x6 (regress) x1 x4 = x2 x3, add(20) rseed(1234) force
mi xeq 0 1 20: summarize x5 x6

mi estimate: logit y x1 x2 x3 x4 x5 x6

I believe I needed to specify regress since x1 and x4 are continuous?

Comment

CEdward

Join Date: Nov 2014

Posts: 131
#5

30 May 2020, 20:31

Originally posted by Jeff Tree View Post

Slight correction to the above:

Code:

mi register imputed y x1 x4 x5 x6 mi impute chained (logit) y x5 x6 (regress) x1 x4 = x2 x3, add(20) rseed(1234) force mi xeq 0 1 20: summarize x5 x6 mi estimate: logit y x1 x2 x3 x4 x5 x6

I believe I needed to specify regress since x1 and x4 are continuous?

That's correct yes. The advantage of MICE is that it allows you to consider each variable with missingness sequentially given their probability distribution. This is in contrast to the multivariate or joint specification group of methods that assume all of the variables with missing values follow a join probability distribution - something that is unlikely in a dataset with many variables.

Here is a paper by Azur et al. 2010 (Multiple imputation by chained equations: what is it and how does it work?) that is a useful review.

I am trying to figure out an MI problem myself: https://www.statalist.org/forums/for...analytic-model. I have a repeated measures design with clustering at multiple levels. The problem is I need to include time in my analytic (and hence imputation model). Time measures the discrete points during the longitudinal study where data is collected. However, when I reshape the dataset from long to wide form, the time variable disappears and I can no longer include it in my imputation model. Any thoughts on this?

By the way, you might find this link useful: https://stats.idre.ucla.edu/stata/fa...ata-using-ice/.
1 like
Comment
Jeff Tree

Join Date: May 2019

Posts: 33
#6

12 Jun 2020, 10:31

Originally posted by Jack Chau View Post

Wonderful, thank you for the suggested article and link -- very helpful!

Oh my, your situation sounds far more complex! I'd also be curious to know if there is a specific protocol for MI with panel or time series data?
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#7

16 Jun 2020, 14:20

Originally posted by Jeff Tree View Post

Wonderful, thank you for the suggested article and link -- very helpful!

Oh my, your situation sounds far more complex! I'd also be curious to know if there is a specific protocol for MI with panel or time series data?

I did some further reading on this and it turns out that methodological advancements have yet to be made in this area (with more than three levels of clustering). It is also impossible to add time in the imputation model when you transform the data.
Comment
Jeff Tree

Join Date: May 2019

Posts: 33
#8

17 Jun 2020, 13:51

Originally posted by Jack Chau View Post

It is also impossible to add time in the imputation model when you transform the data.

Argh, too bad. I would have definitely been interested in using it in another project I'm working on!
Comment

Announcement

Multiple Imputation by Chained Equations (MICE)

Comment

Comment

Comment

Comment

Comment

Comment

Comment