Calibration of logistic regression on large dataset.

ashraf abugroun

Join Date: Nov 2018

Posts: 37
#1

Calibration of logistic regression on large dataset.

31 Jan 2019, 17:03

Evaluating goodness-of-fit for a logistic regression model using the Hosmer-Lemeshow test is not reliable in large datasets.
Which method would then best work. What alternatives.
Also, while many user defined apps for calibration plots, wonder how to manually write a calibration plot (predicted vs observed frequencies)
Regards
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

01 Feb 2019, 08:58

it's not that the H-L test is unreliable in large datasets, it's that it has too much power; you don't say how large your data set is, but you might want to look at

Code:

Paul, P, Pennell, ML and Lemeshow, S (2013), "Standardizing the power of the H-L goodness of fit test in large data sets", Statistics in Medicine, 32:67-80; their recommendations appear on p. 75; further comments, for even larger data sets are on p. 77

re: calibration plots, take a look at #2 in https://www.statalist.org/forums/for...libration-plot
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#3

01 Feb 2019, 10:18

There is a relative new user-contributed package on SSC called -pmcalplot- which produces calibration plots automatically, and includes both the decile groups used for the H-L test, but lowess smoothed curve and a splike plot of events, as well as the typical fit statistics one is used to seeing with these graphs. I have only just found this package, prompted by your post, so I have not used it extensively, but the example code for logistic regression in the help file yields sensible results.

Nowadays, the H-L test is not so commonly performed for evaluating goodness-of-fit, and as a test statistic, has several drawbacks. One notable issue is that the H-L test is for "overall calibration error, not for any particular lack of fit such as quadratic effects."; This is described elsewhere by Frank Harrell Jr, who has written extensively on the topic of prediction, along with other such limitations.

A better test may be found here: D. W. Hosmer, T. Hosmer, S. Le Cessie, S. Lemeshow. A comparison of goodness-of-fit tests for the logistic regression model. Stat in Med (1997) 16(9):965-80.
Comment
istvan mucsi

Join Date: Oct 2016

Posts: 1
#4

22 Feb 2019, 09:45

I think -pmcalplot- only runs on STATA 14 or higher. Where could one find the syntac guide for the package?
Comment

Announcement

Calibration of logistic regression on large dataset.

Comment

Comment

Comment