low pseudo-R2

Lorena pe

Join Date: Oct 2023

Posts: 1
#1

low pseudo-R2

04 Oct 2023, 01:36

Hi,

after running a logistic regression, my finalist model (after checking for interactions and confusion) has a low pseudo-R2. However, I consider that clinically is important to know that the model explains this low percentage of variability for the data. How important do you consider to obtain a high pseudo-R2? thanks
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17675
#2

04 Oct 2023, 04:21

Lorena:
welcome to this forum.
I'd take a look at the average pseudo-R2 reported in similar researches.

Kind regards,
Carlo
(Stata 19.0)
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#3

04 Oct 2023, 05:54

in addition to Carlo's helpful reply, for non-linear models there are many other criteria that may be of more interest; e.g., how well calibrated is your model (i.e., are the predicted probabilities close to the observed probabilities); you might want to see:

Austin, PC and Steyerberg, EW (2013), "Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers", Statistics in Medicine, 33:517-535

Austin, PC and Steyerberg, EW (2019), "The integrated calibration index (ICI) and related metrics for quantifying the calibration of logistic regression results," Statistics in Medicine, 38: 4051-4065

for a general overview of various performance measures: Steyerberg, EW, et al. (2010), "Assessing the performance of prediction models", Epidemiology, 21(1):128-138

finally, note that there are a number of different pseudo-R2's in the literature and none are, in my opinion, good analogues to R2 in linear regression and I personally pay little attention to that measure
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#4

04 Oct 2023, 08:23

A relatively comprehensive article examining and comparing pseudo-R2 measures for binary logistic regression is:

DeMaris, A., 2002. Explained variance in logistic regression: A Monte Carlo study of proposed measures. Sociological Methods & Research, 31(1), pp.27-74.

This article found that for the McKelvey-Zavoina pseudo-R2 was the best of various pseudo-R2 measure for reproducing the R2 for an underlying continuous variable regression model with the response collapsed to a binary variable. The pseudo-R2 reported by Stata here is the McFadden measure, which to my recollection was one of the *worst* performers in DeMaris' study. Some other measures are available in the -fitstat- package; see -ssc describe fitstat-

My experience is that these categorical variable pseudo-R2 values tend always to be low relative to what you think they should be, and I would use them to compare models, rather than make an absolute judgment about a single model. In a study I published on using these measures for ordinal logistic regression, I found that if used to make relative comparisons, all of them gave quite similar guidance about models.
1 like
Comment

Announcement

Comment

Comment

Comment