R-squared in Probit Modelling

Ellinor Hjelvik

Join Date: Sep 2014

Posts: 10
#1

R-squared in Probit Modelling

03 Dec 2016, 13:11

Dear Statalisters,

I have a couple of questions about the R-squared in the probit model. First of all, is it the McFadden Pseudo R2 that is directly reported? I know I can find the Adjusted McFadden R-squared by running 'fitstat' after the logit command, but these two are different.

According to the http://www.ats.ucla.edu/stat/mult_pk..._RSquareds.htm, the formula for MFadden's Pseudo R2 is 1-Lur/Lr. And, hence, the same as a "normal" McFadden R-squared?

I read in some forums that a rule of thumb for a good McFadden’s fit (pseudo or adjusted?) is usually set 0.2 to 0.4. Does anyone know where I could find this in literature?

Futher, even if it is based on the log-likelihoods, is it fair to say that McFadden R-squared explains the variation of the data?

Last edited by Ellinor Hjelvik; 03 Dec 2016, 13:17.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

03 Dec 2016, 14:30

Futher, even if it is based on the log-likelihoods, is it fair to say that McFadden R-squared explains the variation of the data?

No. Having a statistic, R² that captures both proportion of variance explained and goodness of model fit is a distinctive property of ordinary linear regression. The various pseudo-R² statistics that have been developed for other models may do one or the other, but, to my knowledge, none does both.

I read in some forums that a rule of thumb for a good McFadden’s fit (pseudo or adjusted?) is usually set 0.2 to 0.4. Does anyone know where I could find this in literature?

I doubt you will find serious literature supporting this or any other rule of thumb about a "good" McFadden's R² value. (If anyone else knows of some, do post it!) I say this because there isn't even agreement on a "good" value of oridinary R² in regular linear regression. In the social sciences, one is often thrilled to get a value of 0.3, whereas in the physical sciences values of 0.8 would normally be considered laughably low. So it really all depends on what is a reasonable expectation in your domain, given the quality of measures available and the strengths of prediction that a theory can be reasonably expected to give.

Actually, I would discourage you from relying on any single summary statistic to assess the adequacy of your model. The adequacy of your model depends, first and foremost, on what the purpose of your model is. If all you need from your model is a way to discriminate successes (outcome = 1) from failures (outcome = 0), then that is well assessed by the area under the ROC curve (-help roctab-), and you don't need anything else. But a model can do this kind of discrimination well and still produce predicted probabilities that are wildly inaccurate. So if a good quantitative estimation is needed, then you should explore that directly. I like the Hosmer-Lemeshow approach (-estat gof, group(10) table-) after a logit or probit model. but I generally ignore the p-value. Instead, I focus on the actual counts of the predicted and observed outcomes in the table. Are they close enough for practical purposes (whatever practical means in the context of your particular domain and problem)? Is the model predicting accurately at the low end, but poorly at the high end? Or well in the middle, but poorly at the extremes? Or vice versa? The nice thing is that pondering those questions sometimes can suggest ways of improving the model so that it fits well throughout the range of predicted values. (Adding interaction terms, or adding new predictor variables, or quadratic terms, etc.)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4944
#3

03 Dec 2016, 15:13

I don't understand your confusion over whether or not Stata is reporting McFadden R2. What makes you think it isn't? If you show your code and output we can help better.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#4

04 Dec 2016, 07:51

Here's something of use regarding "explained variation" in binary response models:

DeMaris, A., 2002. Explained variance in logistic regression a Monte Carlo study of proposed measures. Sociological Methods & Research, 31(1), pp.27-74.
Comment

Richard Williams

Join Date: Apr 2014
Posts: 4944

04 Dec 2016, 08:29

With regards to pseudo R2, this shows that probit and fitstat are giving the same value for McFadden

Code:

. webuse nhanes2f

. probit diabetes weight height i.female i.race, nolog

Probit regression                               Number of obs     =     10,335
                                                LR chi2(5)        =     152.49
                                                Prob > chi2       =     0.0000
Log likelihood = -1922.8202                     Pseudo R2         =     0.0381

------------------------------------------------------------------------------
    diabetes |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   .0141682   .0014373     9.86   0.000     .0113512    .0169853
      height |  -.0279346   .0033596    -8.31   0.000    -.0345192   -.0213499
    1.female |  -.1555635   .0613226    -2.54   0.011    -.2757537   -.0353734
             |
        race |
      Black  |   .2385539   .0618449     3.86   0.000       .11734    .3597677
      Other  |  -.0267251   .1592437    -0.17   0.867    -.3388371    .2853869
             |
       _cons |   2.007309   .5586607     3.59   0.000     .9123541    3.102264
------------------------------------------------------------------------------

. fitstat

                         |      probit 
-------------------------+-------------
Log-likelihood           |             
                   Model |   -1922.820 
          Intercept-only |   -1999.067 
-------------------------+-------------
Chi-square               |             
     Deviance (df=10329) |    3845.640 
               LR (df=5) |     152.493 
                 p-value |       0.000 
-------------------------+-------------
R2                       |             
                McFadden |       0.038 
     McFadden (adjusted) |       0.035 
      McKelvey & Zavoina |       0.057 
            Cox-Snell/ML |       0.015 
  Cragg-Uhler/Nagelkerke |       0.046 
                   Efron |       0.016 
                Tjur's D |       0.017 
                   Count |       0.952 
        Count (adjusted) |       0.000 
-------------------------+-------------
IC                       |             
                     AIC |    3857.640 
        AIC divided by N |       0.373 
              BIC (df=6) |    3901.100 
-------------------------+-------------
Variance of              |             
                       e |       1.000 
                  y-star |       1.061

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Announcement

R-squared in Probit Modelling

Comment

Comment

Comment

Comment