  • Standard errors with two way fixed-effects when demeaning

    Hi Statalisters,

    I have a panel of 375 regions over 120 months, and am carrying out some fixed effects regressions with the regions as panel units. Rather than including 119 dummy variables to control for "month effects" I opted to demean my variables along the cross-sectional dimension and use "xtreg, fe". However, the standard errors returned when doing this differ from the case in which month dummies are included and "xtreg, fe" is used. The r-squared in each case is also different, presumably because I haven't corrected for the fact I estimated 120 cross section means. Is there a way to correct for this?

    By the way of an example, which draws on an example from this post:

    webuse grunfeld, clear
    xi i.time, pre(D)
    xtreg inv mval kstock D*, fe
    Fixed-effects (within) regression               Number of obs     =        200
    Group variable: company                         Number of groups  =         10
    R-sq:                                           Obs per group:
         within  = 0.7985                                         min =         20
         between = 0.8143                                         avg =       20.0
         overall = 0.8068                                         max =         20
                                                    F(21,169)         =      31.90
    corr(u_i, Xb)  = -0.3250                        Prob > F          =     0.0000
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          mvalue |   .1177158   .0137513     8.56   0.000     .0905694    .1448623
          kstock |   .3579163    .022719    15.75   0.000     .3130667    .4027659
         Dtime_2 |  -19.19741   23.67586    -0.81   0.419    -65.93593    27.54112
         Dtime_3 |  -40.69001   24.69541    -1.65   0.101    -89.44122    8.061213
         Dtime_4 |   -39.2264   23.23594    -1.69   0.093    -85.09647    6.643667
         Dtime_5 |  -69.47029   23.65607    -2.94   0.004    -116.1698   -22.77083
         Dtime_6 |  -44.23507   23.80979    -1.86   0.065      -91.238     2.76785
         Dtime_7 |  -18.80446     23.694    -0.79   0.429     -65.5788    27.96987
         Dtime_8 |  -21.13979   23.38163    -0.90   0.367    -67.29748    25.01789
         Dtime_9 |  -42.97762   23.55287    -1.82   0.070    -89.47334    3.518104
        Dtime_10 |  -43.09876    23.6102    -1.83   0.070    -89.70766    3.510134
        Dtime_11 |  -55.68303   23.89561    -2.33   0.021    -102.8554   -8.510689
        Dtime_12 |  -31.16928   24.11598    -1.29   0.198    -78.77665    16.43809
        Dtime_13 |  -39.39223   23.78368    -1.66   0.100    -86.34361    7.559141
        Dtime_14 |  -43.71651   23.96965    -1.82   0.070    -91.03501    3.601991
        Dtime_15 |   -73.4951   24.18292    -3.04   0.003    -121.2346   -25.75559
        Dtime_16 |  -75.89611   24.34553    -3.12   0.002    -123.9566    -27.8356
        Dtime_17 |   -62.4809   24.86425    -2.51   0.013    -111.5654   -13.39637
        Dtime_18 |  -64.63233    25.3495    -2.55   0.012    -114.6748   -14.58987
        Dtime_19 |  -67.71796   26.61108    -2.54   0.012    -120.2509   -15.18501
        Dtime_20 |  -93.52622   27.10786    -3.45   0.001    -147.0399   -40.01257
           _cons |  -32.83631   18.87533    -1.74   0.084     -70.0981    4.425483
         sigma_u |  91.798268
         sigma_e |  51.724523
             rho |  .75902159   (fraction of variance due to u_i)
    F test that all u_i=0: F(9, 169) = 52.36                     Prob > F = 0.0000

    *Demean in time dimension
    foreach var of varlist invest mvalue kstock {    
        egen double mean_`var'_time = mean(`var'), by(time)
        gen double demean_`var' = mean_`var'_time - `var'
    xtreg demean_invest demean_mvalue demean_kstock, fe
    Fixed-effects (within) regression               Number of obs     =        200
    Group variable: company                         Number of groups  =         10
    R-sq:                                           Obs per group:
         within  = 0.7201                                         min =         20
         between = 0.8143                                         avg =       20.0
         overall = 0.7941                                         max =         20
                                                    F(2,188)          =     241.89
    corr(u_i, Xb)  = -0.3359                        Prob > F          =     0.0000
    demean_invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    demean_mvalue |   .1177158   .0130379     9.03   0.000     .0919965    .1434352
    demean_kstock |   .3579163   .0215404    16.62   0.000     .3154243    .4004082
            _cons |  -1.09e-15   3.467735    -0.00   1.000    -6.840672    6.840672
          sigma_u |  91.798268
          sigma_e |  49.041181
              rho |  .77796838   (fraction of variance due to u_i)
    F test that all u_i=0: F(9, 188) = 58.25                     Prob > F = 0.0000
    Both the standard errors and r-squareds are different across these two regressions. The differences between my analogous regressions are much more pronounced, I imagine because of the number of means estimated.

    Thanks in advance for any time and help.

    Hi Mitchel,
    The bottom line is that when using a demeaning by hand you need to adjust the number of degrees of freedom from your final regression.
    Even though you DO not estimate the time fixed effects in the mode, the fact that you are demeaning all other variables imply that you are already using some degrees of freedom in the model.
    Your best option is to use -reghdfe- from Sergio Correira, which will do the demeaning process and correct the standard errors for you.


      Note also that you really don't gain anything (except the mis-specified standard errors) from going to all the trouble to demean everything.


        Great, thank you both very much for the clarification. In the end I went with Sergio's --reghdfe-- command.

