Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • R-squared in Probit Modelling

    Dear Statalisters,


    I have a couple of questions about the R-squared in the probit model. First of all, is it the McFadden Pseudo R2 that is directly reported? I know I can find the Adjusted McFadden R-squared by running 'fitstat' after the logit command, but these two are different.

    According to the http://www.ats.ucla.edu/stat/mult_pk..._RSquareds.htm, the formula for MFadden's Pseudo R2 is 1-Lur/Lr. And, hence, the same as a "normal" McFadden R-squared?


    I read in some forums that a rule of thumb for a good McFadden’s fit (pseudo or adjusted?) is usually set 0.2 to 0.4. Does anyone know where I could find this in literature?

    Futher, even if it is based on the log-likelihoods, is it fair to say that McFadden R-squared explains the variation of the data?
    Last edited by Ellinor Hjelvik; 03 Dec 2016, 13:17.

  • #2
    Futher, even if it is based on the log-likelihoods, is it fair to say that McFadden R-squared explains the variation of the data?
    No. Having a statistic, R2 that captures both proportion of variance explained and goodness of model fit is a distinctive property of ordinary linear regression. The various pseudo-R2 statistics that have been developed for other models may do one or the other, but, to my knowledge, none does both.

    I read in some forums that a rule of thumb for a good McFadden’s fit (pseudo or adjusted?) is usually set 0.2 to 0.4. Does anyone know where I could find this in literature?
    I doubt you will find serious literature supporting this or any other rule of thumb about a "good" McFadden's R2 value. (If anyone else knows of some, do post it!) I say this because there isn't even agreement on a "good" value of oridinary R2 in regular linear regression. In the social sciences, one is often thrilled to get a value of 0.3, whereas in the physical sciences values of 0.8 would normally be considered laughably low. So it really all depends on what is a reasonable expectation in your domain, given the quality of measures available and the strengths of prediction that a theory can be reasonably expected to give.

    Actually, I would discourage you from relying on any single summary statistic to assess the adequacy of your model. The adequacy of your model depends, first and foremost, on what the purpose of your model is. If all you need from your model is a way to discriminate successes (outcome = 1) from failures (outcome = 0), then that is well assessed by the area under the ROC curve (-help roctab-), and you don't need anything else. But a model can do this kind of discrimination well and still produce predicted probabilities that are wildly inaccurate. So if a good quantitative estimation is needed, then you should explore that directly. I like the Hosmer-Lemeshow approach (-estat gof, group(10) table-) after a logit or probit model. but I generally ignore the p-value. Instead, I focus on the actual counts of the predicted and observed outcomes in the table. Are they close enough for practical purposes (whatever practical means in the context of your particular domain and problem)? Is the model predicting accurately at the low end, but poorly at the high end? Or well in the middle, but poorly at the extremes? Or vice versa? The nice thing is that pondering those questions sometimes can suggest ways of improving the model so that it fits well throughout the range of predicted values. (Adding interaction terms, or adding new predictor variables, or quadratic terms, etc.)

    Comment


    • #3
      I don't understand your confusion over whether or not Stata is reporting McFadden R2. What makes you think it isn't? If you show your code and output we can help better.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Here's something of use regarding "explained variation" in binary response models:

        DeMaris, A., 2002. Explained variance in logistic regression a Monte Carlo study of proposed measures. Sociological Methods & Research, 31(1), pp.27-74.

        Comment


        • #5
          With regards to pseudo R2, this shows that probit and fitstat are giving the same value for McFadden

          Code:
          . webuse nhanes2f
          
          . probit diabetes weight height i.female i.race, nolog
          
          Probit regression                               Number of obs     =     10,335
                                                          LR chi2(5)        =     152.49
                                                          Prob > chi2       =     0.0000
          Log likelihood = -1922.8202                     Pseudo R2         =     0.0381
          
          ------------------------------------------------------------------------------
              diabetes |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                weight |   .0141682   .0014373     9.86   0.000     .0113512    .0169853
                height |  -.0279346   .0033596    -8.31   0.000    -.0345192   -.0213499
              1.female |  -.1555635   .0613226    -2.54   0.011    -.2757537   -.0353734
                       |
                  race |
                Black  |   .2385539   .0618449     3.86   0.000       .11734    .3597677
                Other  |  -.0267251   .1592437    -0.17   0.867    -.3388371    .2853869
                       |
                 _cons |   2.007309   .5586607     3.59   0.000     .9123541    3.102264
          ------------------------------------------------------------------------------
          
          . fitstat
          
                                   |      probit 
          -------------------------+-------------
          Log-likelihood           |             
                             Model |   -1922.820 
                    Intercept-only |   -1999.067 
          -------------------------+-------------
          Chi-square               |             
               Deviance (df=10329) |    3845.640 
                         LR (df=5) |     152.493 
                           p-value |       0.000 
          -------------------------+-------------
          R2                       |             
                          McFadden |       0.038 
               McFadden (adjusted) |       0.035 
                McKelvey & Zavoina |       0.057 
                      Cox-Snell/ML |       0.015 
            Cragg-Uhler/Nagelkerke |       0.046 
                             Efron |       0.016 
                          Tjur's D |       0.017 
                             Count |       0.952 
                  Count (adjusted) |       0.000 
          -------------------------+-------------
          IC                       |             
                               AIC |    3857.640 
                  AIC divided by N |       0.373 
                        BIC (df=6) |    3901.100 
          -------------------------+-------------
          Variance of              |             
                                 e |       1.000 
                            y-star |       1.061
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X