Why does my logistic regression model fail the Homer and Lemeshow's test?

Kasia Taggar

Join Date: May 2021

Posts: 1
#1

Why does my logistic regression model fail the Homer and Lemeshow's test?

13 May 2021, 01:19

Hello everyone, I apologize in advance if my questions seem incredibly dull. Trying to teach myself STATA before I get to grad school as someone who's brain is very much not made for handling statistics.

So trying to understand how to perform a logistic regression using Stata. Using this data set from PEW:https://www.pewresearch.org/social-t...panel-wave-68/.

I thought it might be interesting to see if mask wearing can be explained by partisanship, knowledge of health risk (using variables living in a metropolitan area, age, for risk level, and educational level, and how closely they follow the news for knowledge), and income (because I'm thinking this would affect access to resources/masks?).

I recoded my variables of interest into dichotomous variables. And then when I run my models, the Pseudo R2 value seems to imply that with each model, the explanatory power increases, but when I try to check Homer and Lemeshow's fit, my model is super not a good fit. Any tips? Am I setting up this model correctly? Here is my code: https://docs.google.com/document/d/1...iZwftf5VE/edit
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35697

13 May 2021, 02:56

Please see detailed remarks on cross-posting at #6 of https://www.statalist.org/forums/for...hey-be-dropped This was also posted on Reddit in at least two places and perhaps elsewhere.

Here is your code copied from your last link. Many people could have helpful comments on it, but some will stop at the first line as clearly they have no access to your C: drive and they will draw the line at logging in to the Pew site and downloading the file themselves. That is what it is, and doesn't rule out other comments.

As a very broad-brush comment, I don't think you're doing anything obviously wrong. it's one of the shocks of first working with this kind of data that results are often disappointing. In fact, that should seem less surprising as you reflect on what predictors you would like but don't have. Human attitudes and behaviour aren't strongly determined by a few factors and in fact society would be even more disturbing than it often is if that were so. Sorry if that doesn't help.

Even more pedantically, it's Hosmer not Homer. I met Hosmer once and Lemeshow once, but not together.

Code:

import spss using "C:\Users\NAME\Desktop\ATP W68.sav"


recode F_AGECAT (1/3 = 0) (4 = 1), gen(age)

label variable age "Age"

label define abracket 0 "18-64" 1 "65+" 

label value age abracket

replace age = . if age == 99


recode F_INCOME_RECODE (3 = 0) (1/2 = 1), gen(income)

label variable income "Income Level"

label define irange 0 "<30,000" 1 "30,000+" 

label value income irange

replace income = . if income == 99


recode COVIDFOL_W68 (3/4 = 0) (1/2 = 1), gen(news)

label variable news "News Following"

label define closely 0 "Not too closely/Not at all" 1 "Very/Fairly closely" 

label value news closely

replace news = . if news == 99


recode F_EDUCCAT (3 = 0) (1/2 = 1), gen(edu)

label variable edu "College Degree?/Eucational Attainment"

label define elevel 0 "HS or less" 1 "College+" 

label value edu elevel

replace edu = . if edu == 99


recode COVIDMASK1_W68 (3/4 = 0) (1/2 = 1), gen(mask)

label variable mask "Mask Wearing Behaviour"

label define often 0 "Hardly/Never" 1 "All/Some of the Time" 

label value mask often

drop if mask == 5

drop if mask == 99


recode F_PARTYSUM_FINAL (1 = 0) (2 = 1), gen(party)

label variable party "Party"

label define demrep 0 "Republican" 1 "Democrat"

label value party demrep

replace party = . if party == 9


recode F_METRO (1 = 1) (2 = 0), gen(metro)

label variable metro "Environment"

label define area 0 "Non-Metropolitan" 1 "Metropolitan" 

label value metro area


tab1 metro party mask edu news income age


tabulate mask party, column V

tabulate mask metro, column V

tabulate mask edu, column V

tabulate mask news, column V

tabulate mask income, column V

tabulate mask age, column V



*logit model with risk factors and knowledge

logit mask metro age edu news

logit mask metro age edu news, or


*logit model with risk factors and knowledge, and income

logit mask metro age edu news income

logit mask metro age edu news income, or


*logit model with risk factors and knowledge, and income and party

logit mask metro age news income party

logit mask metro age news income party, or


estat classification


lroc


lfit, group(10) table

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

13 May 2021, 11:29

To Nick's remarks, with which, as usual, I agree completely, I will add this. If your sample size is very large, and if you are judging your goodness of fit by the p-value of the Hosmer-Lemeshow calculation, you may be just seeing meaningless, small departures from good fit. When the sample size gets much above 15,000, the Hosmer-Lemeshow chi square test can find a lack of fit simply because the true data generating process is more like a probit then a logistic one, or some other unimportant departure from the exact logistic model. Of course, that translates to more or less nothing in real world terms, but the difference can be "statistically significant." I will spare you my rant about why statistical significance simply shouldn't be used in general, but you do have to be particularly cautious interpreting p-values in large samples with this test.

I am one of those who was deterred from answer your question by not wanting to download a data set, so this comment may not be at all applicable to your situation, but I thought it worth bringing up as a general point about Hosmer-Lemeshow goodness of fit. It is a good idea when using -estat gof- after a logistic (or probit) regression to specify the -table- option, as you did, and actually look carefully at the differences between observed and expected numbers in each decile, as you may or may not have. In fact, it is a good idea to graph them into a calibration curve of observed vs expected successes by decile to see a) whether the fit actually looks good or not, and b) if not, to observe whether the problem is general, localized to one end of the predicted risk spectrum, or localized to the middle. These observations are sometimes helpful for deciding how to improve the model.

Last edited by Clyde Schechter; 13 May 2021, 11:31.
Comment

Announcement