Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting residuals from reghdfe - wrong or right?

    Dear Stata Users,
    Please help me to understand whether I am wrong. I am using the following regression to estimate residuals:

    Code:
    use http://www.stata-press.com/data/r15/nlswork.dta
     
    sort idcode year
    
    reghdfe ln_wage hours, absorb(idcode year) vce(cluster idcode) residuals
    (dropped 550 singleton observations)
    (MWFE estimator converged in 7 iterations)
    
    HDFE Linear regression                            Number of obs   =     27,917
    Absorbing 2 HDFE groups                           F(   1,   4159) =       2.35
    Statistics robust to heteroskedasticity           Prob > F        =     0.1257
                                                      R-squared       =     0.6555
                                                      Adj R-squared   =     0.5949
                                                      Within R-sq.    =     0.0004
    Number of clusters (idcode)  =      4,160         Root MSE        =     0.3030
    
                                 (Std. Err. adjusted for 4,160 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           hours |    .000749    .000489     1.53   0.126    -.0002097    .0017076
           _cons |   1.650541   .0178933    92.24   0.000     1.615461    1.685621
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
          idcode |      4160        4160           0    *|
            year |        15           0          15     |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    The value of residuals for idcode == 1 and year == 70 is -0.40762286. However, if I plug in the estimates that I got: Y_hat = 0,000749*20 + 1,650541 = 1,665521. Thus, residuals should equal: Y – Y_hat = 1,451214 - 1,665521 = -0,214307. This is different from what the program is giving. Am I wrong in my calculations or there is smth wrong with the code?

  • #2
    Alberto:
    the issue here is to use -xbd- instead of -xb- to predict the fitted values:
    Code:
    . reghdfe ln_wage hours, absorb(idcode year) vce(cluster idcode) residuals
    (dropped 550 singleton observations)
    (MWFE estimator converged in 7 iterations)
    
    HDFE Linear regression                            Number of obs   =     27,917
    Absorbing 2 HDFE groups                           F(   1,   4159) =       2.35
    Statistics robust to heteroskedasticity           Prob > F        =     0.1257
                                                      R-squared       =     0.6555
                                                      Adj R-squared   =     0.5949
                                                      Within R-sq.    =     0.0004
    Number of clusters (idcode)  =      4,160         Root MSE        =     0.3030
    
                                 (Std. err. adjusted for 4,160 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           hours |    .000749    .000489     1.53   0.126    -.0002097    .0017076
           _cons |   1.650541   .0178933    92.24   0.000     1.615461    1.685621
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
          idcode |      4160        4160           0    *|
            year |        15           0          15     |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    
    . predict fitted, xb
    
    . predict fitted_abs, xbd
    
    
    . list idcode year ln_wage hours _reghdfe_resid fitted fitted_abs in 1
    
         +---------------------------------------------------------------------+
         | idcode   year    ln_wage   hours   _reghdfe~d    fitted   fitted_~s |
         |---------------------------------------------------------------------|
      1. |      1     70   1.451214      20   -.40762286   1.66552   1.8588368 |
         +---------------------------------------------------------------------+
    
    . di -1.8588368+1.451214
    -.4076228
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for your reply. The correct way to estimate residuals then is using "predict fitted, xb" and getting residuals as the difference betwen y - fitted. Am I right?

      Comment


      • #4
        Alberto:
        that is the usual way with OLS.
        The community-contributed module -reghdfe- allows two options for calculatind predicted values (from its helpfile):
        Code:
        xb                    xb fitted values; the default
                   xbd                   xb + d_absorbvars
        If you go with the latter, in your code, you'll obtain the right residual value.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you!

          Comment


          • #6
            Alberto:
            as you can see the very same issue creeps up whe we compare the fitted values obtained via the two possible appraoches to run a fixed effect regression on a panel dataset:
            Code:
            . use "https://www.stata-press.com/data/r17/nlswork.dta"
            (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
            
            . regress ln_wage i.idcode i.year age if idcode<=3
            
                  Source |       SS           df       MS      Number of obs   =        39
            -------------+----------------------------------   F(17, 21)       =      2.68
                   Model |  3.54194923        17  .208349955   Prob > F        =    0.0171
                Residual |  1.63378973        21  .077799511   R-squared       =    0.6843
            -------------+----------------------------------   Adj R-squared   =    0.4288
                   Total |  5.17573896        38  .136203657   Root MSE        =    .27893
            
            ------------------------------------------------------------------------------
                 ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                  idcode |
                      2  |  -.3898423     .11632    -3.35   0.003     -.631743   -.1479415
                      3  |  -2.247118   2.111457    -1.06   0.299    -6.638133    2.143897
                         |
                    year |
                     69  |  -.0920902   .5314565    -0.17   0.864    -1.197315    1.013134
                     70  |  -.8648493    .779214    -1.11   0.280    -2.485314    .7556149
                     71  |  -1.248506    1.09967    -1.14   0.269    -3.535396    1.038383
                     72  |   -1.39387   1.443494    -0.97   0.345    -4.395779     1.60804
                     73  |  -1.520276    1.79214    -0.85   0.406    -5.247236    2.206684
                     75  |  -2.049717   2.495803    -0.82   0.421    -7.240024     3.14059
                     77  |  -2.657565   3.203292    -0.83   0.416    -9.319175    4.004045
                     78  |  -2.751196   3.557758    -0.77   0.448    -10.14996    4.647567
                     80  |  -3.324016   4.267534    -0.78   0.445    -12.19884    5.550808
                     82  |  -4.027975   4.983977    -0.81   0.428    -14.39272    6.336774
                     83  |  -4.207353   5.333467    -0.79   0.439     -15.2989    6.884199
                     85  |  -4.730657   6.044586    -0.78   0.443    -17.30106    7.839747
                     87  |  -5.407995   6.755956    -0.80   0.432    -19.45777    8.641785
                     88  |  -5.901929   7.348904    -0.80   0.431    -21.18481    9.380954
                         |
                     age |   .3010572   .3561559     0.85   0.407    -.4396095    1.041724
                   _cons |  -2.882579   5.734884    -0.50   0.620    -14.80892    9.043766
            ------------------------------------------------------------------------------
            
            . predict fitted, xb
            
            . list ln_wage idcode year age fitted if _n==1
            
                   +-------------------------------------------+
                   |  ln_wage   idcode   year   age     fitted |
                   |-------------------------------------------|
                1. | 1.451214        1     70    18   1.671601 |
                   +-------------------------------------------+
            
            . xtset idcode year
            
            Panel variable: idcode (unbalanced)
             Time variable: year, 68 to 88, but with gaps
                     Delta: 1 unit
            
            . xtreg ln_wage i.year age if idcode<=3, fe
            
            Fixed-effects (within) regression               Number of obs     =         39
            Group variable: idcode                          Number of groups  =          3
            
            R-squared:                                      Obs per group:
                 Within  = 0.5596                                         min =         12
                 Between = 0.4744                                         avg =       13.0
                 Overall = 0.0413                                         max =         15
            
                                                            F(15,21)          =       1.78
            corr(u_i, Xb) = -0.9573                         Prob > F          =     0.1102
            
            ------------------------------------------------------------------------------
                 ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    year |
                     69  |  -.0920902   .5314565    -0.17   0.864    -1.197315    1.013134
                     70  |  -.8648493    .779214    -1.11   0.280    -2.485314    .7556149
                     71  |  -1.248506    1.09967    -1.14   0.269    -3.535396    1.038383
                     72  |   -1.39387   1.443494    -0.97   0.345    -4.395779     1.60804
                     73  |  -1.520276    1.79214    -0.85   0.406    -5.247236    2.206684
                     75  |  -2.049717   2.495803    -0.82   0.421    -7.240024     3.14059
                     77  |  -2.657565   3.203292    -0.83   0.416    -9.319175    4.004045
                     78  |  -2.751196   3.557758    -0.77   0.448    -10.14996    4.647567
                     80  |  -3.324016   4.267534    -0.78   0.445    -12.19884    5.550808
                     82  |  -4.027975   4.983977    -0.81   0.428    -14.39272    6.336774
                     83  |  -4.207353   5.333467    -0.79   0.439     -15.2989    6.884199
                     85  |  -4.730657   6.044586    -0.78   0.443    -17.30106    7.839747
                     87  |  -5.407995   6.755956    -0.80   0.432    -19.45777    8.641785
                     88  |  -5.901929   7.348904    -0.80   0.431    -21.18481    9.380954
                         |
                     age |   .3010572   .3561559     0.85   0.407    -.4396095    1.041724
                   _cons |  -3.866807   6.544144    -0.59   0.561     -17.4761    9.742485
            -------------+----------------------------------------------------------------
                 sigma_u |  1.2007631
                 sigma_e |  .27892564
                     rho |   .9488037   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(2, 21) = 6.09                       Prob > F = 0.0082
            
            . predict fitted_fe, xb
            
            . predict fitted_fe_ue, xbu
            
            . g u= fitted_fe_ue- fitted_fe
            
            . list ln_wage idcode year age fitted_fe fitted_fe_ue u if _n==1
            
                   +-----------------------------------------------------------------+
                   |  ln_wage   idcode   year   age   fitte~fe   fitte~ue          u |
                   |-----------------------------------------------------------------|
                1. | 1.451214        1     70    18   .6873738   1.671601   .9842277 |
                   +-----------------------------------------------------------------+
            
            .
            The obtain the same fitted values retrieved from -regress- after -xtreg,fe-, we have to sum up the fitted value+the panel-wise term of the composite error (i.e., -u-).
            Last edited by Carlo Lazzaro; 16 Oct 2022, 03:56.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X