Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calibration of logistic regression on large dataset.

    Evaluating goodness-of-fit for a logistic regression model using the Hosmer-Lemeshow test is not reliable in large datasets.
    Which method would then best work. What alternatives.
    Also, while many user defined apps for calibration plots, wonder how to manually write a calibration plot (predicted vs observed frequencies)
    Regards

  • #2
    it's not that the H-L test is unreliable in large datasets, it's that it has too much power; you don't say how large your data set is, but you might want to look at
    Code:
    Paul, P, Pennell, ML and Lemeshow, S (2013), "Standardizing the power of the H-L goodness of fit test in large data sets", Statistics in Medicine, 32:67-80; their recommendations appear on p. 75; further comments, for even larger data sets are on p. 77
    re: calibration plots, take a look at #2 in https://www.statalist.org/forums/for...libration-plot

    Comment


    • #3
      There is a relative new user-contributed package on SSC called -pmcalplot- which produces calibration plots automatically, and includes both the decile groups used for the H-L test, but lowess smoothed curve and a splike plot of events, as well as the typical fit statistics one is used to seeing with these graphs. I have only just found this package, prompted by your post, so I have not used it extensively, but the example code for logistic regression in the help file yields sensible results.

      Nowadays, the H-L test is not so commonly performed for evaluating goodness-of-fit, and as a test statistic, has several drawbacks. One notable issue is that the H-L test is for "overall calibration error, not for any particular lack of fit such as quadratic effects."; This is described elsewhere by Frank Harrell Jr, who has written extensively on the topic of prediction, along with other such limitations.

      A better test may be found here: D. W. Hosmer, T. Hosmer, S. Le Cessie, S. Lemeshow. A comparison of goodness-of-fit tests for the logistic regression model. Stat in Med (1997) 16(9):965-80.

      Comment


      • #4
        I think -pmcalplot- only runs on STATA 14 or higher. Where could one find the syntac guide for the package?

        Comment

        Working...
        X