Logistic Regression with Small Dependent Variable

Mitch Lingo

Join Date: Jul 2018

Posts: 30
#1

Logistic Regression with Small Dependent Variable

14 Dec 2023, 12:52

I have a binary dependent variable where the "yes" is 1% of a sample of 490k (around 4,900 individuals). Is it possible to perform a logistic regression with a dependent variable at this small of a percentage of the sample having "yes" as the outcome? If not, do you know of any adjustments or programs to use within STATA?

TIA
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#2

14 Dec 2023, 13:14

You could try firthlogit from SSC, but it with only about 49 people who said yes, there is very little information present in that data. No statistical procedure can extract information from data that wasn't in the data to begin with. So I would not get my hope up.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#3

14 Dec 2023, 13:39

Probably Maarten Buis misread the figures. With 4900 events, you should be fine. The issue isn't really the relative number of events, but the absolute number of events. See https://statisticalhorizons.com/logi...r-rare-events/. But yes, try out both regular logit and penalized logit.
2 likes
Comment
Mitch Lingo

Join Date: Jul 2018

Posts: 30
#4

14 Dec 2023, 14:21

Andrew Musau and Maarten Buis - Thank you both! I will report both models
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1119
#5

14 Dec 2023, 19:29

With 4900 events, you could use up to 245 df for explanatory variables by Frank Harrell's 20:1 rule of thumb. But I suspect you don't need that many. Personally, I would be very comfortable with the ordinary logit model.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
2 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 743
#6

15 Dec 2023, 06:01

I agree fully with Bruce Weaver in #5. Either exp(xb)/(1+exp(xb)) is a defensible functional form for P(y=1|x) or it is not. If it is then whether P(y=1|x) is .01 or .50 or .99 seems immaterial.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#7

15 Dec 2023, 06:58

I’m with Bruce and John. firthlogit is a way to impose restrictions in estimation. The functional is the same, and with 4,900 successes I don’t see why firthlogit is necessary.
Comment

Announcement

Logistic Regression with Small Dependent Variable

Comment

Comment

Comment

Comment

Comment

Comment