multiple imputation on multiple variables

Anne Todd

Join Date: Dec 2018
Posts: 163

multiple imputation on multiple variables

10 Jun 2021, 12:15

Hello. I am trying to do multiple imputation to compare its regression output to listwise deletion.

I was chiefly concerned with the p_educ variable because, as you can see from my data below, it has by far the most missings.

So I started with that variable, and this code seemed to work well:

Code:

mi set mlong
mi register imputed health p_educ p_income gender race age
mi impute regress p_educ health p_income gender race age, add(20) rseed (1234) force
mi estimate: regress health p_educ p_income gender race age convinced_level

But, as you can also see, my other variables have a handful of missings as well. Is it possible for me to do multiple imputation for all of those other variables? (except for convinced_level, which is my primary predictor variable and, for a number of reasons, shouldn't be imputed)

Is it as simple as just repeating the "mi impute regress var1 var2 etc." line over and over for each variable that I want to impute, before doing "mi estimate: regress"? Or is there a more efficient way to impute multiple variables all in one go?

Here is some toy data that resembles the structure of my actual data (which is identifiable and unable to be shared publicly). Thank you much!!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(health p_educ) int p_income byte(gender race age) long convinced_level
3 3  23 1 . 13 2
4 .   . 2 2 14 1
2 5  45 1 3 15 .
3 .  65 2 4 14 2
4 .  23 1 1 14 1
5 3  45 1 3 12 3
3 4  88 . 2 12 1
2 . 132 1 3 12 3
3 5  34 2 2 14 2
4 3  54 1 1 14 1
3 2  23 2 3 18 3
2 .  52 1 2 18 2
2 4  47 1 2 18 1
3 3   . 2 2 15 3
4 2  78 . 1 15 2
2 2  43 2 4  . 1
3 3  23 1 1 14 3
4 4  65 . 4 17 .
5 . 143 2 1 17 3
3 3   5 1 . 17 2
2 4   . 2 1 19 1
3 .  23 1 1  . 2
4 3  34 1 1 18 1
3 4  45 2 2 15 3
2 3  76 1 3 15 2
2 2   6 2 2 15 1
3 .  33 2 2 15 3
4 3   . 1 1 12 1
2 4  23 2 1 12 3
3 2  52 1 1 12 2
2 .  47 1 2 14 1
2 .  84 2 2 14 2
3 .  78 1 . 18 1
4 .  43 2 3 18 3
2 2  23 2 4 18 2
3 4  34 . 1 15 1
4 5  45 2 1 15 3
5 3  76 1 1 14 1
3 4  61 1 2 14 3
2 3  90 2 1 14 2
3 2  27 1 4 18 1
4 .  63 2 1 18 3
3 2  34 2 4 18 2
3 3  63 1 1 15 1
3 2  52 2 2 13 3
3 4  15 1 2 14 2
2 1  62 . 1 15 2
2 1  73 2 1 14 2
end
label values convinced_level label
label def label 1 "no", modify
label def label 2 "yes", modify
label def label 3 "maybe", modify

Tags: None

daniel klein

Join Date: Mar 2014

Posts: 3850
#2

10 Jun 2021, 12:31

Originally posted by Anne Todd View Post

Is it possible for me to do multiple imputation for all of those other variables?

Yes. See

Code:

help mi impute chained

Originally posted by Anne Todd View Post

(except for convinced_level, which is my primary predictor variable and, for a number of reasons, shouldn't be imputed)

If this is your primary predictor, it must be included in the imputation model. You might or might not want to restrict the imputed datasets to observations with initially non-missing values on the predictor later in the analyses step. If you tell us more about the "number of reasons", we can comment on that, too.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 695
#3

10 Jun 2021, 12:31

Yes it makes sense to impute all variables in one step mutually. This should do the trick I think:

Code:

mi set mlong mi register imputed health p_educ p_income gender race age convinced_level mi impute chained (regress) p_educ health p_income race age /// (logit) gender /// (ologit) convinced_level /// , add(20) rseed (1234) force mi estimate: regress health p_educ p_income gender race age convinced_level

But I think the scaling is not fine. For example, you do not want to impute a binary variable like gender with regress, just use logit. See my short example and adapt as you think fits best for your vars. Check the generated values afterwards using summarize.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Anne Todd

Join Date: Dec 2018

Posts: 163
#4

10 Jun 2021, 12:35

Thank you Felix Bittmann , it seems the "chained" option was the main thing I was missing, and this is very helpful. Your point about scaling is something I was wondering about in trying to do them all at once--since, as you say, imputing a binary variable alongside continuous variables wouldn't make much sense. Thanks for your assistance!
Comment
Anne Todd

Join Date: Dec 2018

Posts: 163
#5

10 Jun 2021, 12:39

Originally posted by daniel klein View Post

Yes. See

Code:

help mi impute chained

If this is your primary predictor, it must be included in the imputation model. You might or might not want to restrict the imputed datasets to observations with initially non-missing values on the predictor later in the analyses step. If you tell us more about the "number of reasons", we can comment on that, too.

Sorry, I was typing too fast there and didn't mean what I wrote--the actual data I have is restricted to non-missing values on the predictor, so there is nothing to impute!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

10 Jun 2021, 12:48

Originally posted by Anne Todd View Post

Sorry, I was typing too fast there and didn't mean what I wrote--the actual data I have is restricted to non-missing values on the predictor, so there is nothing to impute!

Your example data does not reflect that fact either. Note that restricting the sample before imputation will result in bias if the restricted sample does not represent a random subset, which is very likely. Anyway, if you have variables that do not have missing values, those go to the right-hand side of the equals sign:

Code:

mi impute chained ... = non_missing_variables , add(20)

Also, get rid of the force option; you never want that.
Comment
Anne Todd

Join Date: Dec 2018

Posts: 163
#7

10 Jun 2021, 12:57

Originally posted by daniel klein View Post

Your example data does not reflect that fact either. Note that restricting the sample before imputation will result in bias if the restricted sample does not represent a random subset, which is very likely. Anyway, if you have variables that do not have missing values, those go to the right-hand side of the equals sign:

Code:

mi impute chained ... = non_missing_variables , add(20)

Also, get rid of the force option; you never want that.

I just made the example data manually, I must have put in missings for that variable that I didn't intend...thank you for the explanation on putting them on the right-hand side of the =.

For the force option, it was my understanding that the impute wouldn't work (when I was just doing it on the p_educ variable alone) when other variables were missing, unless the force option was specified. But I see now that when I'm imputing this group of variables simultaneously, that shouldn't be an issue.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#8

10 Jun 2021, 13:08

Originally posted by Anne Todd View Post

For the force option, it was my understanding that the impute wouldn't work (when I was just doing it on the p_educ variable alone) when other variables were missing, unless the force option was specified.

Yes and no. Technically, force makes the imputation "work". I have yet to hear an argument that justifies this practice on theoretical/statistical grounds.
Comment

Announcement

multiple imputation on multiple variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment