Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic Regression with Small Dependent Variable

    I have a binary dependent variable where the "yes" is 1% of a sample of 490k (around 4,900 individuals). Is it possible to perform a logistic regression with a dependent variable at this small of a percentage of the sample having "yes" as the outcome? If not, do you know of any adjustments or programs to use within STATA?

    TIA

  • #2
    You could try firthlogit from SSC, but it with only about 49 people who said yes, there is very little information present in that data. No statistical procedure can extract information from data that wasn't in the data to begin with. So I would not get my hope up.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Probably Maarten Buis misread the figures. With 4900 events, you should be fine. The issue isn't really the relative number of events, but the absolute number of events. See https://statisticalhorizons.com/logi...r-rare-events/. But yes, try out both regular logit and penalized logit.

      Comment


      • #4
        Andrew Musau and Maarten Buis - Thank you both! I will report both models

        Comment


        • #5
          With 4900 events, you could use up to 245 df for explanatory variables by Frank Harrell's 20:1 rule of thumb. But I suspect you don't need that many. Personally, I would be very comfortable with the ordinary logit model.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            I agree fully with Bruce Weaver in #5. Either exp(xb)/(1+exp(xb)) is a defensible functional form for P(y=1|x) or it is not. If it is then whether P(y=1|x) is .01 or .50 or .99 seems immaterial.

            Comment


            • #7
              I’m with Bruce and John. firthlogit is a way to impose restrictions in estimation. The functional is the same, and with 4,900 successes I don’t see why firthlogit is necessary.

              Comment

              Working...
              X