Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why does the negative adjusted R squared become positive after clustering by firmn in xtreg, fe?

    Hello everyone!

    I'd like to ask you a question about the intuition behind running --xtreg, fe-- regression analysis. When I didn't cluster by firm id, I obtained a negative adjusted R2 in the OLS regression. However, when I added clustering by firm id, the same regression gave me a positive adjusted R2. Do you possibly know why?

    Many thanks for your time in advance!

    Best regards,
    Jae

  • #2
    Does anyone have a clue, please?

    Comment


    • #3
      Strange. Are the coefficients changed (are the observations identical)?

      Comment


      • #4
        Jae:
        without sharing what you typed and what Stata gave you back (as per FAQ), it is really unlike to receive a positive reply.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I'm a little confused; -xtreg- does not report an "adjusted" R2 so what exactly are you talking about? maybe show the output by copying-and-pasting within a CODE block

          Comment


          • #6
            Jae:
            I can replicate your issue (actually, -xtreg,fe- returns -Adjusted Rsq- via -ereturn list-):
            Code:
            use "https://www.stata-press.com/data/r17/nlswork.dta"
            . xtreg ln_wage age, fe
            
            Fixed-effects (within) regression               Number of obs     =     28,510
            Group variable: idcode                          Number of groups  =      4,710
            
            R-squared:                                      Obs per group:
                 Within  = 0.1026                                         min =          1
                 Between = 0.0877                                         avg =        6.1
                 Overall = 0.0774                                         max =         15
            
                                                            F(1,23799)        =    2720.20
            corr(u_i, Xb) = 0.0314                          Prob > F          =     0.0000
            
            ------------------------------------------------------------------------------
                 ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                     age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
                   _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
            -------------+----------------------------------------------------------------
                 sigma_u |  .40635023
                 sigma_e |  .30349389
                     rho |  .64192015   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000
            
            . di e(r2_a)
            -.07503239
            
            . xtreg ln_wage age, fe vce(cluster idcode)
            
            Fixed-effects (within) regression               Number of obs     =     28,510
            Group variable: idcode                          Number of groups  =      4,710
            
            R-squared:                                      Obs per group:
                 Within  = 0.1026                                         min =          1
                 Between = 0.0877                                         avg =        6.1
                 Overall = 0.0774                                         max =         15
            
                                                            F(1,4709)         =     884.05
            corr(u_i, Xb) = 0.0314                          Prob > F          =     0.0000
            
                                         (Std. err. adjusted for 4,710 clusters in idcode)
            ------------------------------------------------------------------------------
                         |               Robust
                 ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                     age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
                   _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
            -------------+----------------------------------------------------------------
                 sigma_u |  .40635023
                 sigma_e |  .30349389
                     rho |  .64192015   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            
            . di e(r2_a)
            .10254329
            
            .
            An old Stata thread (https://www.stata.com/statalist/arch.../msg00201.html) explains how to manually calculate -Adjusted Rsq- after -xtreg,fe-.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              If you do it manually, you get the same result (the r2 is the same). So Stata is calculating it another way.

              If you switch to reghdfe, you get the same r2_a, but its much larger.

              Comment


              • #8
                Originally posted by Jae Li View Post
                Hello everyone!

                I'd like to ask you a question about the intuition behind running --xtreg, fe-- regression analysis. When I didn't cluster by firm id, I obtained a negative adjusted R2 in the OLS regression. However, when I added clustering by firm id, the same regression gave me a positive adjusted R2. Do you possibly know why?

                Many thanks for your time in advance!

                Best regards,
                Jae

                See #9: https://www.statalist.org/forums/for...ted-as-missing. With clustering, observations are not independent within clusters, but are between clusters. So the degrees of freedom change, but as I and others argue in the linked thread, the adjusted within-R2 as calculated is not too useful. There, I propose an alternative way to calculate it.

                Comment

                Working...
                X