Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata command for rare events logit estimation

    Hi fellow Stata users:

    I am working with a model where the dependent variable (y=0 or 1) is characterized as a so-called rare event variable: n=40,000 of which y=1 in about 300 cases and in remaining cases it is zero. I have googled and found out few commands that were developed and proposed as a substitute for the standard logit estimation. Namely:

    1. ReLogit by King et al.
    2. firthlogit by Coveney.

    I use Stata 13. ReLogit is nowhere to be found. firthlogit is available through findit however given the short description, it is hard to figure out if it's purpose is to properly estimate models where the dep variable is a so-called rare event.

    I would appreciate any feedback or leads you can provide.

    Best wishes,
    Vahé
    Last edited by vaheheboyan; 16 Oct 2014, 20:34.

  • #2
    I keep getting asked things like this, so here is my canned response:

    Paul Allison has a nice blog entry on this:

    http://www.statisticalhorizons.com/l...or-rare-events

    He says "The problem is not specifically the rarity of events, but rather the possibility of a small number of cases on the rarer of the two outcomes. If you have a sample size of 1000 but only 20 events, you have a problem. If you have a sample size of 10,000 with 200 events, you may be OK. If your sample has 100,000 cases with 2000 events, you’re golden."

    If you do have a problem he suggests using penalized maximum likelihood, which you can do with the -firthlogit- command that you can download from SSC.

    There was also a paper on rare events ("The Problem of Rare Events in Maximum Likelihood Logistic Regression - Assessing Potential Remedies") at the 2013 European Survey Research Association Meetings. See the last paper in the session at

    http://www.europeansurveyresearch.or...?sess=68&day=4

    Finally, Political Scientist Gary King has some papers on this, and also a very old Stata program called relogit (I might read the papers but would probably not use his software). See http://gking.harvard.edu/relogit

    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      If you search for firthlogit on this board, you will see it has come up a few other times. With 300 events, you may be fine with regular logit, but if not then firthlogit would seem to be the way to go.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        I agree with Richard Williams argument above, although I am not sure that relogit is outdated (in the sense of yielding wrong estimates), it is certainly a bit old and not updated. If you want to run RELogit in STATA 13, you will have to store it to your personal directory (for Windows that is c:\ado\personal\) and not to the STBPLUS (c:\ado\stbplus\) directory to make it work. The STBPLUS directory is given in the readme file which comes with the RELogit program files, but STATA 13 does not recognize the STBPLUS directory anymore.

        Comment


        • #5
          I now have a handout (heavily plagiarized, with attribution, from other sources) that summarizes a lot of the points I am aware of on this issue:

          http://www3.nd.edu/~rwilliam/stats3/RareEvents.pdf

          With relogit, I wouldn't say it is wrong but the sources I cite seem to think that other methods are better.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Many thanks for the notes.

            r

            Comment

            Working...
            X