predicts failure perfectly

Michelle Ogunwole

Join Date: May 2019

Posts: 11
#1

predicts failure perfectly

25 Nov 2019, 11:29

When I tried to run a multivariate logistic regression, it said that one my variables, predicted failure perfectly, and that some of the obs dropped and not used? also that some of the variable categories? 5.edu_2 ( I think corresponds to the 5th category in this variable- but maybe not) are omitted because of colliniearity.

when I take this edu_2 variable completely out of the model the inference changes. Can someone explain? should I keep edu_2 it in the model or take it out completely? Can I trust the inferences n the screen shot which is the odds ratio with edu_2 in the model?

Thanks!
Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#2

25 Nov 2019, 11:53

The message tells you that whenever edu_2 = 1, recur_nr_HDP_bin is always equal to 0 (at least this is true in the estimation sample where observations with missing values on any other variables in the model are excluded). You can verify this directly by running:

Code:

tab recur_nr_HDP_bin edu_2 if !missing(recur_nr_HDP_bin bmi_change_cat race_update_cat parity_cat_2 edu_2)

The reason this is a problem is that the maximum likelihood estimate for the coefficient of 1.edu_2 in this circumstance is negative infinity. Otherwise put, the model cannot converge with 1.edu_2 in it. So Stata removes 1.edu_2 and those observations from the model. With 1.edu_2 gone, you are left with as many edu_2 indicators as there are values of edu_2 (other than the omitted 1), so one of those has to go in order to break colinearity, and Stata chose to eliminate the indicator for edu_2 == 5. This is not really a change: it's just the usual omission of one indicator from the set of levels.

You still have all of the levels of edu_2 in the model except for edu_2 = 1. So you can interpret the results as follows: If edu_2 = 1, then the predicted value of recur_nr_HDP_bin is 0. If edu2 != 1, the predicted value of recur_nr_HDP_bin is given by the logistic regression output (or, more properly said, by what -predict- gives when run after the logistic regression model.) The model's domain of applicability is restricted to cases where edu_2 = 1; the latter cases are simply always 0 result.

This type of situation is not uncommon when you have a small sample, as here. You have only 2 observations with edu_2 = 1, so the probability that by chance alone both of those observations will have both outcomes = 0 is 1/4 assuming a 50/50 distribution of the base probability overall. So your data set simply does not have enough information to reasonably estimate the effect of edu2 = 1 on this outcome. (Even if it worked out that one of the outcomes was 0 and the other was 1, that still isn't a lot of information and your coefficient of 1.edu, whatever it might have been, would indeed have a very wide confidence interval.)

With 9 predictors and 68 cases you really are skating on thin ice here. Even the most lenient of statisticians would consider that sample size way too small for that many predictors.

The bottom line here is that you are trying to squeeze more information from your data than it has on offer. The results you show are quite consistent with that: look how wide the confidence intervals on the odds ratios for both levels of bmi_change_cat are: your data are telling you next to nothing about these predictors. It's just that with edu_2 the situation was so extreme that it couldn't even do that much.

The best solution to this problem is to get more data. If that is not feasible, just interpret your model separately for edu_2 = 1 (always 0 outcome) as I mentioned earlier. If that is unsatisfactory (say because the estimate for the group with edu_2 = 1 is a primary goal of your research), you can try fitting the logistic model with Joseph Coveney's -firthlogit- (available from SSC) which estimates logistic models using penalized maximum likelihood and can obtain finite coefficients for situations like this. (But, be prepared, the CI around that estimate will be extremely wide.) -exlogistic- is another possibility here, and it is specifically designed to be used with very small data sets. It runs very slowly because it is very computationally intensive, and also uses a great deal of memory. But if your computer has the heft to deal with it, it will give you answers. But again, it doesn't draw blood from stones: you will get answers that are very imprecise.
1 like
Comment
Sarah Lee

Join Date: May 2023

Posts: 3
#3

30 Jan 2024, 19:52

Hello Dr. Schechter @Clyde Schechter

I have learned a lot from your postings and replies so far. I want to express my gratitude to you first. Thank you, sir!!

I am also facing a similar problem. I have two IPWRA models, the same treatments and covariates, but different Ys. One model is linear; the other is logit.
My linear model works well, but my logit model does not.

teffects ipwra (y x1 x2 i.indus ..., logit) (treatment x1, i.indus ...), atet tlevel(3)

And the result says,
outcome model: perfect predictions detected; the model, as specified, is not identified 1. treatment#7.indus != 0 (n=28) predicts failure perfectly

So I deleted i. indus, but then,

Iteration 0: EE criterion = 2.302e-10
Iteration 1: EE criterion = 4.535e-17
The Gauss-Newton stopping criterion has been met but missing standard errors indicate some of the parameters are not identified.

Treatment-effects estimation Number of obs = 1,953
Estimator : IPW regression adjustment
Outcome model : logit
Treatment model: (multinomial) logit

<Result>

Warning: Convergence not achieved.

Could you explain to me what the problem would be?
Also, I am wondering how to delete 28 cases ("1. treatment#7.indus != 0 (n=28)").

Your advice will be much appreciated!!

Last edited by Sarah Lee; 30 Jan 2024, 20:25.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#4

30 Jan 2024, 22:50

I'm afraid I can't help you. I don't use -teffects- myself and I do not know how it handles this situation.

In an ordinary -logit- regression, you don't have to do anything to respond to an error like "1. treatment#7.indus != 0 (n=28) predicts failure perfectly" as Stata would remove those 28 observations automatically and then proceed. There would be no inherent difficulty in resolving the model once those observations are removed. But, in some way, -teffects ipwra- is different, perhaps because of the second equation for propensity calculation. But I'm just speculating about that, and even if correct, I don''t know what the solution to that problem is.

FYI, if you need to manually remove those observations it would be:

Code:

drop if treatment == 1 & indus == 7

I suppose you can try doing that and re-running the model to see if that helps. But if it doesn't, I suggest you post back, and, hopefully, somebody else will be able to help.
Comment

Announcement

predicts failure perfectly

Comment

Comment

Comment