Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • low pseudo-R2

    Hi,

    after running a logistic regression, my finalist model (after checking for interactions and confusion) has a low pseudo-R2. However, I consider that clinically is important to know that the model explains this low percentage of variability for the data. How important do you consider to obtain a high pseudo-R2? thanks

  • #2
    Lorena:
    welcome to this forum.
    I'd take a look at the average pseudo-R2 reported in similar researches.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      in addition to Carlo's helpful reply, for non-linear models there are many other criteria that may be of more interest; e.g., how well calibrated is your model (i.e., are the predicted probabilities close to the observed probabilities); you might want to see:

      Austin, PC and Steyerberg, EW (2013), "Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers", Statistics in Medicine, 33:517-535

      Austin, PC and Steyerberg, EW (2019), "The integrated calibration index (ICI) and related metrics for quantifying the calibration of logistic regression results," Statistics in Medicine, 38: 4051-4065

      for a general overview of various performance measures: Steyerberg, EW, et al. (2010), "Assessing the performance of prediction models", Epidemiology, 21(1):128-138

      finally, note that there are a number of different pseudo-R2's in the literature and none are, in my opinion, good analogues to R2 in linear regression and I personally pay little attention to that measure

      Comment


      • #4
        A relatively comprehensive article examining and comparing pseudo-R2 measures for binary logistic regression is:

        DeMaris, A., 2002. Explained variance in logistic regression: A Monte Carlo study of proposed measures. Sociological Methods & Research, 31(1), pp.27-74.

        This article found that for the McKelvey-Zavoina pseudo-R2 was the best of various pseudo-R2 measure for reproducing the R2 for an underlying continuous variable regression model with the response collapsed to a binary variable. The pseudo-R2 reported by Stata here is the McFadden measure, which to my recollection was one of the *worst* performers in DeMaris' study. Some other measures are available in the -fitstat- package; see -ssc describe fitstat-

        My experience is that these categorical variable pseudo-R2 values tend always to be low relative to what you think they should be, and I would use them to compare models, rather than make an absolute judgment about a single model. In a study I published on using these measures for ordinal logistic regression, I found that if used to make relative comparisons, all of them gave quite similar guidance about models.

        Comment

        Working...
        X