Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tipping point analysis with a single binary panel variable

    I have a dataset with a binary variable (healthscreenanyq) reflecting whether or not patients got a health screening across 5 timepoints over the course of 12 months. Participants in this dataset are randomized to three conditions (cond). About 27% of observations of healthscreenanyq are missing. Using this variable, I calculated another variable that summarizing healthscreenanyq over the course of a year, assuming conservatively that missing data in healthscreenanyq relfected not having gotten screened. So, healthscreenanyy reflects whether participants ever reported being screened during the study.

    I fit a logistic regression model for healthscreenanyy, and it showed that the adjusted probabilities of getting screened at least once were 57% in group 0, 91% in group 1, and 89% in group 2. Since there is quite a bit of missing data in healthscreenanyq, though, I was hoping to see what impact different scenarios for the missing data might have on these results. Specifically, I'm wondering if there's a way to determine what rates would need to be in the missing data in order to close the gap between group 0 and groups 1 and 2? That is, what would the rate of screening need to be in the 27% of missing data (overall) in order for group 0 to increase by 20%? Or, is there maybe a way to create new variables with different rates of screening in the missing data, so I can see what impact that would have manually? Basically looking for a version of tipping point analysis, & it seems like it should be a lot simpler than a lot of tutorials I've seen, just given that this is a single binary variable. Maybe there's an even simpler solution I'm missing?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id byte(qmonth cond) float(healthscreenanyq healthscreenanyy)
    100005  1 1 1 1
    100005  4 1 1 1
    100005  7 1 1 1
    100005 10 1 1 1
    100005 12 1 0 1
    100006  1 2 1 1
    100006  4 2 1 1
    100006  7 2 1 1
    100006 10 2 1 1
    100006 12 2 0 1
    100007  1 0 . 1
    100007  4 0 1 1
    100007  7 0 0 1
    100007 10 0 0 1
    100007 12 0 0 1
    100008  1 1 1 1
    100008  4 1 1 1
    100008  7 1 . 1
    100008 10 1 1 1
    100008 12 1 . 1
    end
    label values cond cond

  • #2
    At the extremes, you can just assign 0 to missing or 1 to missing. It will give you the full range.

    you could loop through various levels (though I'd probably do it repeatedly):

    forv i = 0/1 {
    capture drop alt
    g alt = healthyscreeninganyq
    replace alt = `i' if mi(alt)
    --do stuff--
    }

    forv i = 1(5)100 {
    capture drop alt
    g alt = healthyscreeninganyq
    replace alt = 0 if runiform()>`i'/100 & mi(alt)
    --do stuff--
    }


    Comment

    Working...
    X