Dear Stata-listers,
I am wondering if you have any wisdom to share on the following:
I am analysing data from a large case-control study on psychotic disorders. I am interested in trying to find out which factors explain excess odds of disorder in ethnic minority groups. As was expected, there was a degree of missing data. Because I included a sizeable number of categorical variables, I used the ice imputation command, and I used the following for post-imputation analyses:
Since running these analyses first, sampling weights have become available - to reflect the probability of being selected for controls only. They are available by setting for age group, sex and ethnic majority/minority status.
I have two questions:
1. Would it be appropriate to use sampling weights on this occasion? It is a case-control study and I'm not trying to estimate 'true' prevalence in the general population. Furthermore, all variables used for weighting are a) used in the imputation model, and b) in the logistic regression model.
2. If this would be appropriate, how would I be able to incorporate these weights into the post-imputation estimation model?
Thank you very much for any help.
Best wishes,
Hannah
I am wondering if you have any wisdom to share on the following:
I am analysing data from a large case-control study on psychotic disorders. I am interested in trying to find out which factors explain excess odds of disorder in ethnic minority groups. As was expected, there was a degree of missing data. Because I included a sizeable number of categorical variables, I used the ice imputation command, and I used the following for post-imputation analyses:
Code:
local rhs "i.ethn c.age##sex pat_age ctq_tot can_lifetime i.pat_ses i.ed_cat rel_ever living_ever cul_dis" noi mi estimate, or saving(miest, replace): melogit case `rhs' || setting: qui mi query local M=r(M) scalar r2=0 scalar cstat=0 qui mi xeq 1/`M': logistic case `rhs'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) scalar r2=r2/`M' scalar cstat=cstat/`M' noi di "Pseudo R=squared over imputed data = " r2 noi di "C statistic over imputed data = " cstat
I have two questions:
1. Would it be appropriate to use sampling weights on this occasion? It is a case-control study and I'm not trying to estimate 'true' prevalence in the general population. Furthermore, all variables used for weighting are a) used in the imputation model, and b) in the logistic regression model.
2. If this would be appropriate, how would I be able to incorporate these weights into the post-imputation estimation model?
Thank you very much for any help.
Best wishes,
Hannah