You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
I would like to ask you however why you do think that adjusting for survey design is not useful. in my case the survey has oversampled ethnic minorities and disadvantaged groups, Pweights only cover attrition not item non-response which is what is causing my missing data.
As for your suggestion do you mean separate by strata and PSU and then perform the mi chained?
This issue of weighting an imputation model is not one I've looked into in any depth, so I'll leave the question of whether or not to do it to others. But if you want to run separate imputation models for each combination of strata and PSU (which is one of several suggestions I've seen for how to incorporate them into an imputation model) all you need to do is add by(strata PSU) to the end of your mi impute chained command.
Russell Dimond
Statistical Computing Specialist
Social Science Computing Cooperative
University of Wisconsin-Madison
I had a theoretical reason for thinking that sample weighting of the imputation process was wrong: that prediction was intended for the particular sample, not for the population. But upon looking at the literature, I find I was wrong.
MI variance estimators can be biased if survey weights are not used in the imputation model and sampling is "informative". The situation is worst for domain analyses, if the domain definition is not also a predictor in the model. A sampling domain is a non-stratum subgroup for which separate analyses are required; the Stata term is sub-populations.) This situation was first exposed for the case of estimating a domain mean by Kott, 1995. See the Introduction to Reist and Larsen (2012) for a brief summary.
So, weights should be incorporated into the imputation model. However, weighting the model (e.g. weight option in mi impute), the solution for Kott's simplified problem, does not appear to be the best approach. Rather, the recommendation of Carpenter (2011) and others is to use the weights, first grouped, as main effect predictors and as components of interaction terms. A preferable alternative, if available, is to incorporate into the model other variables that determine the weights. For example, in the Georgia Reproductive Health Survey (Serbanescu, 2011), selection probabilities differed by geographical stratum and by number of females in the household eligible for the survey. These factors could enter directly into an imputation model.
One approach to implementing Russell's suggestion is based on Reiter et al. (2006), who state:
In some surveys the design may be so complicated that it is impractical to include dummy variables for every cluster. In these cases, imputers can simplify the model for the design variables, for example collapsing cluster categories or including proxy variables (e.g., cluster size) that are related to the outcome of interest.
Thus, you could separately impute in subgroups formed by these variables.
Kim, Jae Kwang, Brick Michael, J, Wayne A Fuller, and Graham Kalton. 2006. On the bias of the multiple-imputation variance estimator in survey sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, no. 3: 509-521. http://jkim.public.iastate.edu/2006_JRSSB.pdf
Reiter, Jerome P, Trivellore E Raghunathan, and Satkartar K Kinney. 2006. The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology 32, no. 2: 143.
Serbanescu F, Egnatashvili V, Ruiz A, Suchdev D, Goodwin M (2011): Reproductive Health Survey Georgia, 2010 Summary Report. Division of Reproductive Health, Centers for Disease Control and Prevention (DRH/CDC) Atlanta, Georgia USA.
i need help with repeated-imputation inference program to work on survey of consumer finances. The code I have is
rii , imp(Y): regress X1 X2 X3 X4 X101, robust
Comment