Logit Regression

Anuja Tandon

Join Date: Jan 2017

Posts: 17
#1

Logit Regression

07 Jul 2017, 04:27

Hi,

I have a sample of 1400 companies and I have a binary dependant variable- 1 or 0. So for that sample, I have only 11 '1' i.e. the event happens and rest are 'event doesn't happen. Will that give me any useful output?
What should be an ideal distribution?

Thanks!

Regards,
Anuja
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3416
#2

07 Jul 2017, 05:00

The problem with your data is that your dependent variable has very low variance, and thus there is very little variance that can be explained by the explanatory variables. With only 11 events you may get something with only one explanatory variable, but would be skeptical about the results (if you get any), because there is just too little information present in your data. More than 1 explanatory variable does not make sense to me in that situation.

The ideal distribution would be 700 events and 700 non-events, i.e. 50% events and 50% non-events. That way you maximize the variance in the explanatory variable.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

07 Jul 2017, 05:14

I don't have used it so far, but I suggest you take a look at the penalized maximum likelihood estimation (i.e., the Firth method), available in Stata by installing the user-written program - firfhlogit -, whose author is Joseph Coveney (by the way, a very active member of this Forum).

You may as well wish to read this excellent text on rare events under logistic regression, written by Richard Williams, also a very active member of this Forum.

Hopefully that helps!

Best regards,

Marcos
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17630
#4

08 Jul 2017, 00:16

Anuja:
two asides to previous helpful comments:
- first, I would check for any data entry error in your dependent variable;
- provided that no error is detected, with such a wide difference between 1s and 0s in the regressand, I would focus on descriptive statistics only.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Anuja Tandon

Join Date: Jan 2017

Posts: 17
#5

11 Jul 2017, 20:10

Thanks a lot! This helped a lot! I will try firthlogit, hopefully will work. Will keep you posted.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4368
#6

11 Jul 2017, 21:34

There is another user-written command that I believe should also be considered in any context where you're contemplating using firthlogit:

Code:

search penlogit

will show from where you can install it.

And not least:

Code:

bayes, <considered priors>: logit
Comment
Anuja Tandon

Join Date: Jan 2017

Posts: 17
#7

17 Jul 2017, 22:09

Hi I tried penlogit but the predicted values (option pr,; so the option to get probability for a positive outcome) are negative. What would that mean?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4368
#8

18 Jul 2017, 05:02

Originally posted by Anuja Tandon View Post

I tried penlogit but the predicted values (option pr,; so the option to get probability for a positive outcome) are negative. What would that mean?

My guess is that you got linear predictions (log odds).
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#9

18 Jul 2017, 06:29

Hello, author of -penlogit- here.

Joseph is right. Even by specifying the option pr, you'll get linear predictions (log odds). You can take the invlogit of the linear predictions to obtain probabilities.
I'll look into why the option pr doesn't work (even if -penlogit- has e(predict)=logit_p in ereturn list).
Comment

Andrea Discacciati

Join Date: Feb 2016
Posts: 194

#10

18 Jul 2017, 08:57

Anuja Tandon

Thanks to Anuja for spotting an issue with -predict-'s behavior after -penlogit- (the fault is of course mine and does not depend on -predict-).

This has been addressed in version 1.1.0 of -penlogit- available on GitHub (https://github.com/anddis/penlogit). See below for more info.

Since -penlogit- calls -glm- under the hood, it's probably more logical for -penlogit- to use -glim_p- as opposed to -logit_p-. See help glm postestimation##predict

Code:

// Update penlogit to version 1.1.0 from GitHub
net install penlogit, from("https://raw.githubusercontent.com/anddis/penlogit/master/") replace

// Load the full dataset on neonatal mortality (Neutra et al. 1978)
use http://www.imm.ki.se/biostatistics/data/neutra1978.dta, clear

// Estimate penalized maximum-likelihood odds-ratio for "no monitoring" status and age at delivery
xi: penlogit death nomonit i.teenages, lfprior(nomonit log(2) 2000 2 0.5) nprior(_Iteenages_1 log(2) 0.5 _Iteenages_2 log(4) 0.5)

// Calculates the linear predictor (log odds)
predict logodds, xb

// Calculates the probability of a positive outcome
predict prob, mu

// Display results
table nomonit teenages, c(mean logodds mean prob)

Last edited by Andrea Discacciati; 18 Jul 2017, 09:07.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4368
#11

18 Jul 2017, 17:19

Originally posted by Andrea Discacciati View Post

This has been addressed in version 1.1.0 of -penlogit- available on GitHub (https://github.com/anddis/penlogit).

Code:

search penlogit

doesn't show that location from which to install the command. You might want to notify StataCorp that you've got the most recent version at that URL so that they can update the search locations to include it.
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#12

19 Jul 2017, 01:49

Joseph Coveney I'll run some more checks to make sure that everything's ok and then I'll submit a "Software update" to the Stata Journal.
At any rate, thank you for pointing out the possibility to make one's own website available through Stata (http://www.stata.com/support/faqs/re...ing-a-command/ – see 2). I honestly didn't know it was an option.
Comment
Anuja Tandon

Join Date: Jan 2017

Posts: 17
#13

24 Jul 2017, 22:32

Thanks a lot Andrea and Joseph!
Comment
Anuja Tandon

Join Date: Jan 2017

Posts: 17
#14

25 Jul 2017, 01:46

Hi! In my dataset I have only 3% 1. So 47 out of 1500. I tried penlogit, logit and firthlogit and all of them give similar result. Does this mean that my data is robust or is it garbage in-garbage out? Thanks!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment