problem with multiple imputation

Ruowen Shen

Join Date: Jun 2016

Posts: 35
#1

problem with multiple imputation

30 Jun 2016, 19:10

I am doing multiple imputation for two variables in my ordinal logistic model. One of these two variables is continuous variable measuring organization capacity, another is likert scale variable measuring political ideology. Basically, I should have about 900 cases in the sample, but these two variables have about 400 missing values. I wrote my command as below but there is always error info following command mi estimate: ologit as "estimation sample varies between m=1 and m=11; click here for details
r(459)". I have no idea with this error information. And another thing i am not sure if it is related is that all my other variables have some missing values, but I just decided to impute these two variables. If anyone could help, I will appreciate it.

mi set mlong
mi register imputed Ideology OrgCapacityIndex
mi impute mvn Ideology OrgCapacityIndex, add(10)
mi estimate: ologit .....
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3824
#2

01 Jul 2016, 03:05

Hard to tell why you get this error from the information provided. I would guess it has to do with the fact that you imputed 10 datasets (but see my second comment below) but Stata reports problems with m=1 and m=11, meaning there seems to be an additional imputed dataset - perhaps from a previous mi impute ... , add(1) command?

Anyway, I guess you should consider yourself lucky, since without the error you would probably not have posted here and there are several potential problems with your approach.

First, your imputation model does not include any predictor variables. Thus, the relationship between the imputed values and any other variables in your analysis model, including the outcome, will not be represented properly. Note that in any case you must include the outcome (response, dependent variable, ...) in your imputation model. If your outcome has missing values, imputing those is only reasonable if your imputation model is larger (in the sense that it has more predictors) than the model used for analysis later. If this is not the case, mark the missing values in your outcome and delete the respective cases after imputation (cf. van Hippel 2007).

Second, 10 imputations seem way too few for the almost 50 percent missing values you report. I would go with at least 50 imputations here. See Paul Allison's blog for starters.

Third, the multivariate normal model might not be the best choice for imputing missing values in quasi-interval variables like Likert type items (note scales, as scales arise from combining several Likert type Items). This is a point where you might well disagree and find the multivariate normal to suit your needs.

Somehow related to the above, why did you decide to impute values only for two variables? You will probably have to explain this decision to reviewers, supervisors or lecturers. If you decide to go with MI, why not go all the way and impute all missing values?

I hope my comments are helpful for further development of your analysis.

Best
Daniel

van Hippel, P. (2007). Regression with missing ys: An improved strategy for analyzing multiply imputed data. Sociological Methodology, 37(1), pp. 83-117.

Last edited by daniel klein; 01 Jul 2016, 03:11. Reason: probably identified the initial problem at last
2 likes
Comment
Ruowen Shen

Join Date: Jun 2016

Posts: 35
#3

02 Jul 2016, 19:40

Thanks a lot, Daniel. I sort of undertand your comments. Since I have 9 independent variables in my ordinal logistic regression model, only one of the variable is complete (obs # 903), which is community population. I should have imputed all the variables with missing value, but considering the theory in my field, I am not sure if some of the variables could be imputed. Or, I should keep them as originally collected. The reason why I chose orgcapacity and ideology to impute missing value is because they have too many missing value, which cause my number of observation drop down from about 900 to 150. But i think impute missing values for all variables are reasonable as well.
So I revised my command as below.
mi set mlong
mi register imputed var1 var 2...var 8
mi impute mvn var 1 var 2...var8=population outcome variable, add(50)
mi estimate: ologit .....

But there is another piece of error coming out after step 3: "Iteration 0: imputed data contain missing values
This may occur when imputation variables are used as independent variables, when independent
variables contain missing values, or when variance-covariance matrix becomes not positive
definite. You can specify option force if you wish to proceed anyway.
r(498);"
Is my third step correct? I mean put the completed variable(population) and outcome variable (with about 400 missing value) in the model?
For me outcome variable, I would prefer keep it as original collected, or delete imputed cases.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#4

04 Jul 2016, 05:14

If the outcome has missing values you should not put it on the right hand side of the equals sign. Instead include it on the left hand side. As said, before imputation you can create a marker variable for the missing observations in your outcome, then delete these cases after imputation.

Running an ordered logistic regression makes imputing the outcome within a multivariate normal framework a bit suspicious. Maybe you should consider chained equations here.

Best
Daniel
Comment
Ruowen Shen

Join Date: Jun 2016

Posts: 35
#5

14 Jul 2016, 16:46

Hi Daniel,
I followed your suggestions and made my model work! Thank you very much.
But currently, I am using MI and retain the imputed values in my Dv. I have a question on MID you mentioned. Deletion means I need to delete imputed values in Dv by hand? Is there any STATA syntax on this technique? Second, shen should I delete imputed value in DV? I mean should I do deletion after the command "mi impute chained (regress) X1 X2....,add(10)", and then do "mi estimate:ologit Y X1 X2...." with deleted (original) Dv?
Thanks.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#6

15 Jul 2016, 01:18

I usually do

Code:

generate y_miss = missing(depvar) mi impute ... mi xeq : drop if y_miss mi estimate ...

Best
Daniel
Comment

Announcement

problem with multiple imputation

Comment

Comment

Comment

Comment

Comment