So I just want to see whether cases have higher rates of multiple different variables compared to controls. So do cases have a higher BMI compared to controls? Do cases have higher rates of smoking compared to controls? etc. I have both continuous and categorical variables. I'm not sure whether to examine each variable individually as I have done above for BMI using xtreg, or put them all into a model (using xtreg or clogit?). Either way I will need to control for a categorical cluster variable and will probably need to control for variables such as age, gender and SES. Can you advise based on this what you think sounds most sensible?
I should also add that some of these predictors have about 20-30% missing data, mostly all from the same people (i.e. data is missing across every single variable for 20-30% of participants, except for the variable that defined whether they were cases or controls). I am not sure whether to remove these people entirely and redo the matching or keep them in as they can be identified as a case/control but there is just no data on their predictor variables.
Many thanks for your continued help.
I should also add that some of these predictors have about 20-30% missing data, mostly all from the same people (i.e. data is missing across every single variable for 20-30% of participants, except for the variable that defined whether they were cases or controls). I am not sure whether to remove these people entirely and redo the matching or keep them in as they can be identified as a case/control but there is just no data on their predictor variables.
Many thanks for your continued help.
Comment