What pseudo-R-squared does Stata store in e(r2_p) after -xtlogit , fe- ?

Judith Kaiser

Join Date: Aug 2023

Posts: 3
#1

What pseudo-R-squared does Stata store in e(r2_p) after -xtlogit , fe- ?

17 Aug 2023, 03:54

Dear Statalisters,

I am running conditional fixed effects models using -xtlogit , fe- and would like to report Pseudo-R-squared ( I am using Stata 17.0 SE). I know there are issues with interpreting pseudo-R-squared. It may not tell me much about my model at all, but I would like to report it.
I know, I can get Stata to display the pseudo-R-squared in the output using -di e(r2_p)- .

What I can’t figure out from the Stata help file on -xtlogit- is, what pseudo-R-squared Stata is reporting.

1) Is it McFadden’s R-squared?
(Since this seems to be the default (https://www.statalist.org/forums/for...averaged-model) Sorry, I’m not sure, how to link posts correctly)

2) Is it referring to the overall model or just the within variation?

I have read „Do it yourself R-squared“ (https://www.stata.com/support/faqs/s...ics/r-squared/) by Nick Cox (so thank you). It says that „the formula for pseudo–R-squared is documented in [R] maximize“. So, it’s possible that I should be able to figure out the answer to my questions here, but I haven’t been able to. I am, nevertheless, sorry for asking questions I should be able to figure out the answer to on my own.

Thanks in advance!
Judith
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10194

17 Aug 2023, 04:23

It is always McFadden's pseudo-$R^2$ in official Stata commands. This gives you the improvement of the log-likelihood as a result of adding regressors to the model. Below, $L_0$ is the log-likelihood from the model with intercept only and $L_1$ is the maximized log-likelihood from the estimated model.

$$\text{McFadden's Pseudo}\; R^{2}= 1 - \frac{L_{1}}{L_{0}}$$

Code:

webuse union, clear
clogit union, group(id)
local L_0= e(ll)
xtlogit union age grade i.not_smsa south##c.year, fe
local L_1= e(ll)
display "Pseudo R2= `:di %9.8f `=1-(`L_1'/`L_0')''"
display e(r2_p)

Res.:

Code:

. clogit union, group(id)
note: multiple positive outcomes within groups encountered.
note: 2,744 groups (14,165 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log likelihood = -4550.1859

Conditional (fixed-effects) logistic regression   Number of obs   =      12035
                                                  LR chi2(0)      =       0.00
                                                  Prob > chi2     =          .
Log likelihood = -4550.1859                       Pseudo R2       =     0.0000

------------------------------------------------------------------------------
       union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
------------------------------------------------------------------------------

.
. local L_0= e(ll)

.
. xtlogit union age grade i.not_smsa south##c.year, fe
note: multiple positive outcomes within groups encountered.
note: 2,744 groups (14,165 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log likelihood = -4516.5881  
Iteration 1:   log likelihood = -4510.8906  
Iteration 2:   log likelihood =  -4510.888  
Iteration 3:   log likelihood =  -4510.888  

Conditional fixed-effects logistic regression   Number of obs     =     12,035
Group variable: idcode                          Number of groups  =      1,690

                                                Obs per group:
                                                              min =          2
                                                              avg =        7.1
                                                              max =         12

                                                LR chi2(6)        =      78.60
Log likelihood  =  -4510.888                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
       union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0710973   .0960536     0.74   0.459    -.1171643    .2593589
       grade |   .0816111   .0419074     1.95   0.051    -.0005259     .163748
  1.not_smsa |   .0224809   .1131786     0.20   0.843     -.199345    .2443069
     1.south |  -2.856488   .6765694    -4.22   0.000    -4.182539   -1.530436
        year |  -.0636853   .0967747    -0.66   0.510    -.2533602    .1259896
             |
south#c.year |
          1  |   .0264136   .0083216     3.17   0.002     .0101036    .0427235
------------------------------------------------------------------------------

.
. local L_1= e(ll)

.
. display "Pseudo R2= `:di %9.8f `=1-(`L_1'/`L_0')''"
Pseudo R2= 0.00863656

.
. display e(r2_p)
.00863656

.

Last edited by Andrew Musau; 17 Aug 2023, 04:26.

Comment

Judith Kaiser

Join Date: Aug 2023

Posts: 3
#3

17 Aug 2023, 07:09

Thank you!

Just to be sure, I am understanding you correctly: You are showing me that if I were to calculate McFadden’s R-squared myself, I would get the same result I get with -di e(r2_p)- .

So, that’s a yes to my first question.

With regards to my second question, I’m still struggling with it (which doesn’t mean, you didn’t answer the question, just that I’m working on understanding your answer):
From how McFadden’s R-squared is calculated, I would guess that it refers to the overall model, since it is based on maximized log-likelihoods and not on the sum of squares of residuals as it would be in a linear model.
Am I getting that correct?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#4

17 Aug 2023, 08:43

Originally posted by Judith Kaiser View Post

Just to be sure, I am understanding you correctly: You are showing me that if I were to calculate McFadden’s R-squared myself, I would get the same result I get with -di e(r2_p)- .

Yes.

With regards to my second question, I’m still struggling with it (which doesn’t mean, you didn’t answer the question, just that I’m working on understanding your answer):
From how McFadden’s R-squared is calculated, I would guess that it refers to the overall model, since it is based on maximized log-likelihoods and not on the sum of squares of residuals as it would be in a linear model.
Am I getting that correct?

Yes, it is a pseudo-R squared and as stated, it represents the improvement in the log-likelihood that results in adding regressors. Like the $R^2$ statistic in linear regression, if the additional regressors add nothing, $L_0 = L_1$ in the formula in #2, and the pseudo-$R^2$ = 0. As log-likelihoods for binary dependent variable models are usually negative, the statistic is positive when the log-likelihood for the estimated model is larger than that of the intercept-only model. In this sense, it replicates this property $0\leq R^2 \leq 1$ for binary choice.
Comment
Judith Kaiser

Join Date: Aug 2023

Posts: 3
#5

17 Aug 2023, 09:28

Thank you!
Comment

Announcement

What pseudo-R-squared does Stata store in e(r2_p) after -xtlogit , fe- ?

Comment

Comment

Comment

Comment