I have a dataset with a binary variable (healthscreenanyq) reflecting whether or not patients got a health screening across 5 timepoints over the course of 12 months. Participants in this dataset are randomized to three conditions (cond). About 27% of observations of healthscreenanyq are missing. Using this variable, I calculated another variable that summarizing healthscreenanyq over the course of a year, assuming conservatively that missing data in healthscreenanyq relfected not having gotten screened. So, healthscreenanyy reflects whether participants ever reported being screened during the study.
I fit a logistic regression model for healthscreenanyy, and it showed that the adjusted probabilities of getting screened at least once were 57% in group 0, 91% in group 1, and 89% in group 2. Since there is quite a bit of missing data in healthscreenanyq, though, I was hoping to see what impact different scenarios for the missing data might have on these results. Specifically, I'm wondering if there's a way to determine what rates would need to be in the missing data in order to close the gap between group 0 and groups 1 and 2? That is, what would the rate of screening need to be in the 27% of missing data (overall) in order for group 0 to increase by 20%? Or, is there maybe a way to create new variables with different rates of screening in the missing data, so I can see what impact that would have manually? Basically looking for a version of tipping point analysis, & it seems like it should be a lot simpler than a lot of tutorials I've seen, just given that this is a single binary variable. Maybe there's an even simpler solution I'm missing?
I fit a logistic regression model for healthscreenanyy, and it showed that the adjusted probabilities of getting screened at least once were 57% in group 0, 91% in group 1, and 89% in group 2. Since there is quite a bit of missing data in healthscreenanyq, though, I was hoping to see what impact different scenarios for the missing data might have on these results. Specifically, I'm wondering if there's a way to determine what rates would need to be in the missing data in order to close the gap between group 0 and groups 1 and 2? That is, what would the rate of screening need to be in the 27% of missing data (overall) in order for group 0 to increase by 20%? Or, is there maybe a way to create new variables with different rates of screening in the missing data, so I can see what impact that would have manually? Basically looking for a version of tipping point analysis, & it seems like it should be a lot simpler than a lot of tutorials I've seen, just given that this is a single binary variable. Maybe there's an even simpler solution I'm missing?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long id byte(qmonth cond) float(healthscreenanyq healthscreenanyy) 100005 1 1 1 1 100005 4 1 1 1 100005 7 1 1 1 100005 10 1 1 1 100005 12 1 0 1 100006 1 2 1 1 100006 4 2 1 1 100006 7 2 1 1 100006 10 2 1 1 100006 12 2 0 1 100007 1 0 . 1 100007 4 0 1 1 100007 7 0 0 1 100007 10 0 0 1 100007 12 0 0 1 100008 1 1 1 1 100008 4 1 1 1 100008 7 1 . 1 100008 10 1 1 1 100008 12 1 . 1 end label values cond cond
Comment