Question on fixed effects regression

Samyam Shrestha

Join Date: Apr 2018

Posts: 63
#1

Question on fixed effects regression

31 Mar 2022, 12:03

Hi,

I have seen some people run a fixed effects regression in the following way that includes county and year fixed effects:

Code:

xi i.county i.year reg yvar xvar _I*, robust

But I've also seen others do it the following way:

Code:

xtset county year xtreg yvar xvar, fe robust

Apart from the fact that the former creates all the dummy variables and shows their coefficients in the results output, I think they should give the same coefficient for xvar. However, there's a slight difference between the two.

Are the two approaches to fixed effects necessarily the same? Why are the coefficients slightly off?

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

31 Mar 2022, 13:26

No, they are not at all the same. Also, the first is an obsolete way of coding it, and may also be wrong, and the second is just wrong. Here's how it should be coded in modern Stata:

Code:

xtset country year // NOTE: MENTION OF year HERE IS OPTIONAL xtreg yvar xvar i.year, fe vce(cluster country)

Explanations:

1. The -xi- prefix is mostly obsolete. In modern Stata we use factor-variable notation instead. Read -help fvvarlist- for details. Yes, there are some commands that do not support factor-variable notation, but mostly they are seldom used, old commands whose functions have been superseded by newer ones that do. So, while you probably should keep -xi- around in some dusty corner of your mind because you might, rarely, need to use it, for the most part you should forget it ever existed.

2. -xtset- requires the mention of a panel variable; the mention of a time variable is optional.* It is important to understand that although the use of -xtset- causes Stata to automatically incorporate or condition on the panel variable when you run an -xt- regression command, it does not cause Stata to incorporate the time variable into those models. If you want time fixed effects in the model, you have to explicitly write them into the regression command. So your second version, the one with -xtreg- is wrong because it does not include any time fixed effects (which you said at the top you wanted.)

3. With panel data you should be using cluster robust standard errors, not plain robust standard errors, unless the number of clusters is too small.

*The time variable must be given if you want to use time-series operators like lag, lead, seasonal difference, etc., or if you want to do a model that incorporates autoregressive structure. Otherwise, it is not needed.
2 likes
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17708

01 Apr 2022, 00:58

Samyam:
as as aside to Clyde's excellent advice, you can go -fe- with -regress-, too (even though this approach implies loosing pieces of informatin that -xtreg,fe- provides).
In the following toy-example the sample estimates of the coefficients shared by the two regression codes are identical, whereas all the other statistics differ.
Please also note that I've -xtset- the dataset with -timevar- too because I'm used to do so. Obviously, Clyde is 100% right in stating that, provided that you do not plan to use time-series operators (such as lags and leads), you can safely -xtset- your dataset with -panelid- only:

Code:

use "https://www.stata-press.com/data/r17/nlswork.dta"

. regress ln_wage c.age##c.age i.year i.idcode if idcode<=3, vce(cluster idcode)

Linear regression                               Number of obs     =         39
                                                F(2, 2)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.8139
                                                Root MSE          =     .21943

                                 (Std. err. adjusted for 3 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0773019   .0106911     7.23   0.019     .0313017    .1233021
             |
 c.age#c.age |  -.0045583    .002264    -2.01   0.182    -.0142995    .0051828
             |
        year |
         69  |   .3367906   .0914392     3.68   0.066    -.0566405    .7302218
         70  |   .2089384   .2867011     0.73   0.542    -1.024637    1.442514
         71  |   .3144116   .1619035     1.94   0.192     -.382203    1.011026
         72  |   .5888124   .4958888     1.19   0.357    -1.544825     2.72245
         73  |   .8912873   .5219448     1.71   0.230     -1.35446    3.137034
         75  |   1.246958   .6073839     2.05   0.176    -1.366404     3.86032
         77  |   1.560689   .8626802     1.81   0.212    -2.151125    5.272502
         78  |   1.941522   1.278416     1.52   0.268    -3.559059    7.442103
         80  |    2.34498   1.525965     1.54   0.264    -4.220718    8.910678
         82  |   2.698954   1.663018     1.62   0.246    -4.456435    9.854344
         83  |   2.994437    1.81452     1.65   0.241    -4.812813    10.80169
         85  |   3.538578   2.210833     1.60   0.251    -5.973868    13.05102
         87  |   3.965153   2.460506     1.61   0.248    -6.621548    14.55185
         88  |    4.40786   2.688929     1.64   0.243    -7.161667    15.97739
             |
      idcode |
          2  |  -.4183815   .0165036   -25.35   0.002    -.4893909   -.3473721
          3  |   .6579353   .7215294     0.91   0.458    -2.446555    3.762426
             |
       _cons |   1.341224   .1489003     9.01   0.012     .7005575     1.98189
------------------------------------------------------------------------------

. xtset idcode year

Panel variable: idcode (unbalanced)
 Time variable: year, 68 to 88, but with gaps
         Delta: 1 unit

. xtreg ln_wage c.age##c.age i.year if idcode<=3, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =         39
Group variable: idcode                          Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.7404                                         min =         12
     Between = 0.4068                                         avg =       13.0
     Overall = 0.4014                                         max =         15

                                                F(4,2)            =          .
corr(u_i, Xb) = -0.8560                         Prob > F          =          .

                                 (Std. err. adjusted for 3 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0773019   .0101936     7.58   0.017     .0334424    .1211613
             |
 c.age#c.age |  -.0045583   .0021586    -2.11   0.169    -.0138461    .0047294
             |
        year |
         69  |   .3367906   .0871839     3.86   0.061    -.0383313    .7119126
         70  |   .2089384   .2733588     0.76   0.525    -.9672295    1.385106
         71  |   .3144116   .1543689     2.04   0.179    -.3497843    .9786076
         72  |   .5888124   .4728115     1.25   0.339    -1.445531    2.623156
         73  |   .8912873   .4976548     1.79   0.215    -1.249948    3.032523
         75  |   1.246958   .5791178     2.15   0.164    -1.244785    3.738701
         77  |   1.560689   .8225333     1.90   0.198    -1.978387    5.099764
         78  |   1.941522   1.218922     1.59   0.252    -3.303077    7.186121
         80  |    2.34498   1.454951     1.61   0.248    -3.915167    8.605128
         82  |   2.698954   1.585626     1.70   0.231    -4.123442     9.52135
         83  |   2.994437   1.730077     1.73   0.226    -4.449484    10.43836
         85  |   3.538578   2.107946     1.68   0.235    -5.531183    12.60834
         87  |   3.965153      2.346     1.69   0.233     -6.12887    14.05918
         88  |    4.40786   2.563793     1.72   0.228    -6.623251    15.43897
             |
       _cons |   1.465543   .3990418     3.67   0.067    -.2513952    3.182481
-------------+----------------------------------------------------------------
     sigma_u |  .54258328
     sigma_e |  .21942548
         rho |  .85944136   (fraction of variance due to u_i)
------------------------------------------------------------------------------
.

Kind regards,
Carlo
(Stata 19.0)

Comment

Samyam Shrestha

Join Date: Apr 2018

Posts: 63
#4

01 Apr 2022, 21:29

Thanks for the very detailed responses!

What if my data is not panel? My individual-level data is a repeated cross-section and I want to add county and year fixed effects. How does this change the xtset?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#5

02 Apr 2022, 00:07

Samyan:
if you have repeated cross-sectional data, you should go -regress-:

Code:

reg y x i.id i.year

If necessary, standard errors can be clustered on some relevant variable (say, country).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement