This question is mainly directed at Daniel Klein, author of mimrgns (available from SSC) but others are welcome to chime in too.
A student asked me how to estimate marginal effects when using multiple imputation. That was easy enough. I told him to use Daniel's mimrgns command,
But, then another complication became apparent. The student also wants to include things like interaction effects and squared terms in his model. According to Allison and others (see http://www.stata.com/statalist/archi.../msg00613.html) interaction terms should be treated like "just another variable." Specifically, Allison says
In multiple imputation, interactions should be imputed as though they are additional variables, not constructed by multiplying imputed values. The same is true if you have x and x^2 in a model. The x^2 term should be imputed just like any other variable, not constructed by squaring the imputed values of x. While this principle may seem counterintuitive, it is easily demonstrated by simulation that the more "natural" way to do it produces biased estimates.
So, Allison is saying that if you want age squared in a model, you first do
gen agesq = age * age
and then include agesq as a variable to be imputed.
BUT, that goes counter to the advice given for interaction terms when you want to estimate marginal effects. If, say, you want both age and age squared in your model, then your estimation command should be something like
reg y age c.age#c.age
Otherwise margins will not know that if age = 20, age^2 has to equal 400.
So, if you follow Allison's advice and compute the interaction or squared term yourself, then the eventual mimrgns command will give incorrect estimates of the marginal effects. But, if you don't follow his advice, you will get biased estimates.
I don't know if there is any established research on this. If I really want the marginal effects, then my inclination is to use factor variables and accept the bias. But maybe you should just say that marginal effects shouldn't be done if you want both multiple imputation and interaction or squared effects.
My main advice to the student was to see whether the multiple imputation is actually gaining him all that much! If not, it may be better just to use listwise deletion. But as far as I can tell, something has to give somewhere. Any thoughts on how best to handle this would be appreciated.
A student asked me how to estimate marginal effects when using multiple imputation. That was easy enough. I told him to use Daniel's mimrgns command,
But, then another complication became apparent. The student also wants to include things like interaction effects and squared terms in his model. According to Allison and others (see http://www.stata.com/statalist/archi.../msg00613.html) interaction terms should be treated like "just another variable." Specifically, Allison says
In multiple imputation, interactions should be imputed as though they are additional variables, not constructed by multiplying imputed values. The same is true if you have x and x^2 in a model. The x^2 term should be imputed just like any other variable, not constructed by squaring the imputed values of x. While this principle may seem counterintuitive, it is easily demonstrated by simulation that the more "natural" way to do it produces biased estimates.
So, Allison is saying that if you want age squared in a model, you first do
gen agesq = age * age
and then include agesq as a variable to be imputed.
BUT, that goes counter to the advice given for interaction terms when you want to estimate marginal effects. If, say, you want both age and age squared in your model, then your estimation command should be something like
reg y age c.age#c.age
Otherwise margins will not know that if age = 20, age^2 has to equal 400.
So, if you follow Allison's advice and compute the interaction or squared term yourself, then the eventual mimrgns command will give incorrect estimates of the marginal effects. But, if you don't follow his advice, you will get biased estimates.
I don't know if there is any established research on this. If I really want the marginal effects, then my inclination is to use factor variables and accept the bias. But maybe you should just say that marginal effects shouldn't be done if you want both multiple imputation and interaction or squared effects.
My main advice to the student was to see whether the multiple imputation is actually gaining him all that much! If not, it may be better just to use listwise deletion. But as far as I can tell, something has to give somewhere. Any thoughts on how best to handle this would be appreciated.
Comment