Hi All
I'd like some advice on running a more complex multiple imputation model. Some of the variables I would like to impute include 'smoking frequency (number of cigarettes smoked daily)' and 'alcohol consumption frequency' (frequency of weekly consumption). Only individuals who answered yes to preceding questions on 'ever smoking (yes/no)' and 'ever alcohol consumption (yes/no)' will answer questions on frequency of smoking and alcohol consumption.
I've been advised that it is not the best to impute 'smoking frequency' and 'alcohol consumption frequency', the same way as I would other variables in the model (even though theoretically speaking, one can assume that those who answer 'no' to ever smoking would also answer 'no' to smoking frequency). One possibility is to conduct multiple imputation with the conditional option. Based on a similar post some years ago:
https://www.statalist.org/forums/for...nd-conditional
If, I understand correctly, the advice is to impute the above variables (smoking frequency and alcohol frequency) first (in terms of order) and then include all other variables to be imputed as usual. Is this the right interpretation, and does the below syntax make sense:
- I'm unsure whether the 'pmm' and 'knn' options above are required?
- I'd like some explanation of what actually happens in the imputation model when one used the conditional function (I've tried to understand from online resources but with limited success!)
Many thanks
/Amal
I'd like some advice on running a more complex multiple imputation model. Some of the variables I would like to impute include 'smoking frequency (number of cigarettes smoked daily)' and 'alcohol consumption frequency' (frequency of weekly consumption). Only individuals who answered yes to preceding questions on 'ever smoking (yes/no)' and 'ever alcohol consumption (yes/no)' will answer questions on frequency of smoking and alcohol consumption.
I've been advised that it is not the best to impute 'smoking frequency' and 'alcohol consumption frequency', the same way as I would other variables in the model (even though theoretically speaking, one can assume that those who answer 'no' to ever smoking would also answer 'no' to smoking frequency). One possibility is to conduct multiple imputation with the conditional option. Based on a similar post some years ago:
https://www.statalist.org/forums/for...nd-conditional
If, I understand correctly, the advice is to impute the above variables (smoking frequency and alcohol frequency) first (in terms of order) and then include all other variables to be imputed as usual. Is this the right interpretation, and does the below syntax make sense:
Code:
mi impute chained (logit, augment) smokefreq alcoholfreq (pmm, knn(3) conditional (if smoking==1 | if alcoholfreq==1)) (logit) druguse (mlogit) education (regress) bmi bp = sex ses ethnicity, add(30) rseed (543210)
- I'd like some explanation of what actually happens in the imputation model when one used the conditional function (I've tried to understand from online resources but with limited success!)
Many thanks
/Amal