Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical insignficance on 'rare event' regression


    Hi all,
    I am currently undertaking a project in which I am trying to establish low income as a predictor for problem gambling. After research, I found the firthlogit command (followed by finding margins) is supposedly the optimum command as it penalizes Maximum Likelihood Estimation. I have taken data from a government dataset, in which there were 19 observations of problem gamblers (where PROMBLEMGAMBLER = 1). Is my methodology (including using firthlogit) flawed in trying to find this link, or do the majority of my explanatory variables just insignificant in predicting likelihood of being a problem gambler. Apologies if this is a ridiculous question, I am fairly new to Stata.
    Thanks


    Stata Command: firthlogit PROBLEMGAMBLER Lowincome ib2.Sex ib7.ag16g10 i.origin2 i.Educ2 marriedcohabiting
    Click image for larger version

Name:	Screenshot 2024-02-12 151740.png
Views:	3
Size:	45.3 KB
ID:	1742948
    Attached Files

  • #2
    This is not a Stata difficulty, but rather that you have too few of the events of interest to get precise estimates of the relationship of your predictors to the outcome. There is a continuing literature giving guidance about the number of predictors that is reasonable to include in a logistic regression model in relation to the number of events observed on the outcome variable. Some sources suggest "at least 10 evens per predictor," although not everyone would agree with that. (Try searching the literature on something like /number of events predictors logistic regression/ to learn more.) You just aren't going to get precise or "significant' [sic] estimates with a small number of events, and while better statistical methods like Firth may reduce the bias in your estimates, they won't magically solve the problem of imprecise estimates stemming from too little data. Note here that the number of events (problem gamblers) is the issue, not the total sample size.

    Comment


    • #3
      Benjamin:
      as an aside to Mike's helfpul reply, I'd also investigate whether your regression specification suffers from reverse causation endogeneity.
      To a absolute amateur of this kind of research like me, it would seem that being a problematic gambler can explain low income, other things being equal.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thank you both, I think I will still proceed with it as my research and just make sure to highlight the problematic number of events and possibility of problem gambling being endogenous of income. I appreciate the words of advice.

        Comment

        Working...
        X