Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What pseudo-R-squared does Stata store in e(r2_p) after -xtlogit , fe- ?

    Dear Statalisters,


    I am running conditional fixed effects models using -xtlogit , fe- and would like to report Pseudo-R-squared ( I am using Stata 17.0 SE). I know there are issues with interpreting pseudo-R-squared. It may not tell me much about my model at all, but I would like to report it.
    I know, I can get Stata to display the pseudo-R-squared in the output using -di e(r2_p)- .

    What I can’t figure out from the Stata help file on -xtlogit- is, what pseudo-R-squared Stata is reporting.

    1) Is it McFadden’s R-squared?
    (Since this seems to be the default (https://www.statalist.org/forums/for...averaged-model) Sorry, I’m not sure, how to link posts correctly)

    2) Is it referring to the overall model or just the within variation?


    I have read „Do it yourself R-squared“ (https://www.stata.com/support/faqs/s...ics/r-squared/) by Nick Cox (so thank you). It says that „the formula for pseudo–R-squared is documented in [R] maximize“. So, it’s possible that I should be able to figure out the answer to my questions here, but I haven’t been able to. I am, nevertheless, sorry for asking questions I should be able to figure out the answer to on my own.


    Thanks in advance!
    Judith

  • #2
    It is always McFadden's pseudo-\(R^2\) in official Stata commands. This gives you the improvement of the log-likelihood as a result of adding regressors to the model. Below, \(L_0\) is the log-likelihood from the model with intercept only and \(L_1\) is the maximized log-likelihood from the estimated model.

    $$\text{McFadden's Pseudo}\; R^{2}= 1 - \frac{L_{1}}{L_{0}}$$

    Code:
    webuse union, clear
    clogit union, group(id)
    local L_0= e(ll)
    xtlogit union age grade i.not_smsa south##c.year, fe
    local L_1= e(ll)
    display "Pseudo R2= `:di %9.8f `=1-(`L_1'/`L_0')''"
    display e(r2_p)
    Res.:

    Code:
    . clogit union, group(id)
    note: multiple positive outcomes within groups encountered.
    note: 2,744 groups (14,165 obs) dropped because of all positive or
          all negative outcomes.
    
    Iteration 0:   log likelihood = -4550.1859
    
    Conditional (fixed-effects) logistic regression   Number of obs   =      12035
                                                      LR chi2(0)      =       0.00
                                                      Prob > chi2     =          .
    Log likelihood = -4550.1859                       Pseudo R2       =     0.0000
    
    ------------------------------------------------------------------------------
           union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    ------------------------------------------------------------------------------
    
    .
    . local L_0= e(ll)
    
    .
    . xtlogit union age grade i.not_smsa south##c.year, fe
    note: multiple positive outcomes within groups encountered.
    note: 2,744 groups (14,165 obs) dropped because of all positive or
          all negative outcomes.
    
    Iteration 0:   log likelihood = -4516.5881  
    Iteration 1:   log likelihood = -4510.8906  
    Iteration 2:   log likelihood =  -4510.888  
    Iteration 3:   log likelihood =  -4510.888  
    
    Conditional fixed-effects logistic regression   Number of obs     =     12,035
    Group variable: idcode                          Number of groups  =      1,690
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        7.1
                                                                  max =         12
    
                                                    LR chi2(6)        =      78.60
    Log likelihood  =  -4510.888                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |   .0710973   .0960536     0.74   0.459    -.1171643    .2593589
           grade |   .0816111   .0419074     1.95   0.051    -.0005259     .163748
      1.not_smsa |   .0224809   .1131786     0.20   0.843     -.199345    .2443069
         1.south |  -2.856488   .6765694    -4.22   0.000    -4.182539   -1.530436
            year |  -.0636853   .0967747    -0.66   0.510    -.2533602    .1259896
                 |
    south#c.year |
              1  |   .0264136   .0083216     3.17   0.002     .0101036    .0427235
    ------------------------------------------------------------------------------
    
    .
    . local L_1= e(ll)
    
    .
    . display "Pseudo R2= `:di %9.8f `=1-(`L_1'/`L_0')''"
    Pseudo R2= 0.00863656
    
    .
    . display e(r2_p)
    .00863656
    
    .
    Last edited by Andrew Musau; 17 Aug 2023, 04:26.

    Comment


    • #3
      Thank you!

      Just to be sure, I am understanding you correctly: You are showing me that if I were to calculate McFadden’s R-squared myself, I would get the same result I get with -di e(r2_p)- .

      So, that’s a yes to my first question.

      With regards to my second question, I’m still struggling with it (which doesn’t mean, you didn’t answer the question, just that I’m working on understanding your answer):
      From how McFadden’s R-squared is calculated, I would guess that it refers to the overall model, since it is based on maximized log-likelihoods and not on the sum of squares of residuals as it would be in a linear model.
      Am I getting that correct?

      Comment


      • #4
        Originally posted by Judith Kaiser View Post

        Just to be sure, I am understanding you correctly: You are showing me that if I were to calculate McFadden’s R-squared myself, I would get the same result I get with -di e(r2_p)- .
        Yes.

        With regards to my second question, I’m still struggling with it (which doesn’t mean, you didn’t answer the question, just that I’m working on understanding your answer):
        From how McFadden’s R-squared is calculated, I would guess that it refers to the overall model, since it is based on maximized log-likelihoods and not on the sum of squares of residuals as it would be in a linear model.
        Am I getting that correct?
        Yes, it is a pseudo-R squared and as stated, it represents the improvement in the log-likelihood that results in adding regressors. Like the \(R^2\) statistic in linear regression, if the additional regressors add nothing, \(L_0 = L_1\) in the formula in #2, and the pseudo-\(R^2\) = 0. As log-likelihoods for binary dependent variable models are usually negative, the statistic is positive when the log-likelihood for the estimated model is larger than that of the intercept-only model. In this sense, it replicates this property \(0\leq R^2 \leq 1\) for binary choice.

        Comment


        • #5
          Thank you!

          Comment

          Working...
          X