Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on fixed effects regression

    Hi,

    I have seen some people run a fixed effects regression in the following way that includes county and year fixed effects:

    Code:
    xi i.county i.year
    reg yvar xvar _I*, robust
    But I've also seen others do it the following way:
    Code:
    xtset county year
    xtreg yvar xvar, fe robust
    Apart from the fact that the former creates all the dummy variables and shows their coefficients in the results output, I think they should give the same coefficient for xvar. However, there's a slight difference between the two.

    Are the two approaches to fixed effects necessarily the same? Why are the coefficients slightly off?

    Thanks!

  • #2
    No, they are not at all the same. Also, the first is an obsolete way of coding it, and may also be wrong, and the second is just wrong. Here's how it should be coded in modern Stata:

    Code:
    xtset country year // NOTE: MENTION OF year HERE IS OPTIONAL
    xtreg yvar xvar i.year, fe vce(cluster country)
    Explanations:

    1. The -xi- prefix is mostly obsolete. In modern Stata we use factor-variable notation instead. Read -help fvvarlist- for details. Yes, there are some commands that do not support factor-variable notation, but mostly they are seldom used, old commands whose functions have been superseded by newer ones that do. So, while you probably should keep -xi- around in some dusty corner of your mind because you might, rarely, need to use it, for the most part you should forget it ever existed.

    2. -xtset- requires the mention of a panel variable; the mention of a time variable is optional.* It is important to understand that although the use of -xtset- causes Stata to automatically incorporate or condition on the panel variable when you run an -xt- regression command, it does not cause Stata to incorporate the time variable into those models. If you want time fixed effects in the model, you have to explicitly write them into the regression command. So your second version, the one with -xtreg- is wrong because it does not include any time fixed effects (which you said at the top you wanted.)

    3. With panel data you should be using cluster robust standard errors, not plain robust standard errors, unless the number of clusters is too small.

    *The time variable must be given if you want to use time-series operators like lag, lead, seasonal difference, etc., or if you want to do a model that incorporates autoregressive structure. Otherwise, it is not needed.

    Comment


    • #3
      Samyam:
      as as aside to Clyde's excellent advice, you can go -fe- with -regress-, too (even though this approach implies loosing pieces of informatin that -xtreg,fe- provides).
      In the following toy-example the sample estimates of the coefficients shared by the two regression codes are identical, whereas all the other statistics differ.
      Please also note that I've -xtset- the dataset with -timevar- too because I'm used to do so. Obviously, Clyde is 100% right in stating that, provided that you do not plan to use time-series operators (such as lags and leads), you can safely -xtset- your dataset with -panelid- only:
      Code:
      use "https://www.stata-press.com/data/r17/nlswork.dta"
      
      . regress ln_wage c.age##c.age i.year i.idcode if idcode<=3, vce(cluster idcode)
      
      Linear regression                               Number of obs     =         39
                                                      F(2, 2)           =          .
                                                      Prob > F          =          .
                                                      R-squared         =     0.8139
                                                      Root MSE          =     .21943
      
                                       (Std. err. adjusted for 3 clusters in idcode)
      ------------------------------------------------------------------------------
                   |               Robust
           ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               age |   .0773019   .0106911     7.23   0.019     .0313017    .1233021
                   |
       c.age#c.age |  -.0045583    .002264    -2.01   0.182    -.0142995    .0051828
                   |
              year |
               69  |   .3367906   .0914392     3.68   0.066    -.0566405    .7302218
               70  |   .2089384   .2867011     0.73   0.542    -1.024637    1.442514
               71  |   .3144116   .1619035     1.94   0.192     -.382203    1.011026
               72  |   .5888124   .4958888     1.19   0.357    -1.544825     2.72245
               73  |   .8912873   .5219448     1.71   0.230     -1.35446    3.137034
               75  |   1.246958   .6073839     2.05   0.176    -1.366404     3.86032
               77  |   1.560689   .8626802     1.81   0.212    -2.151125    5.272502
               78  |   1.941522   1.278416     1.52   0.268    -3.559059    7.442103
               80  |    2.34498   1.525965     1.54   0.264    -4.220718    8.910678
               82  |   2.698954   1.663018     1.62   0.246    -4.456435    9.854344
               83  |   2.994437    1.81452     1.65   0.241    -4.812813    10.80169
               85  |   3.538578   2.210833     1.60   0.251    -5.973868    13.05102
               87  |   3.965153   2.460506     1.61   0.248    -6.621548    14.55185
               88  |    4.40786   2.688929     1.64   0.243    -7.161667    15.97739
                   |
            idcode |
                2  |  -.4183815   .0165036   -25.35   0.002    -.4893909   -.3473721
                3  |   .6579353   .7215294     0.91   0.458    -2.446555    3.762426
                   |
             _cons |   1.341224   .1489003     9.01   0.012     .7005575     1.98189
      ------------------------------------------------------------------------------
      
      . xtset idcode year
      
      Panel variable: idcode (unbalanced)
       Time variable: year, 68 to 88, but with gaps
               Delta: 1 unit
      
      . xtreg ln_wage c.age##c.age i.year if idcode<=3, fe vce(cluster idcode)
      
      Fixed-effects (within) regression               Number of obs     =         39
      Group variable: idcode                          Number of groups  =          3
      
      R-squared:                                      Obs per group:
           Within  = 0.7404                                         min =         12
           Between = 0.4068                                         avg =       13.0
           Overall = 0.4014                                         max =         15
      
                                                      F(4,2)            =          .
      corr(u_i, Xb) = -0.8560                         Prob > F          =          .
      
                                       (Std. err. adjusted for 3 clusters in idcode)
      ------------------------------------------------------------------------------
                   |               Robust
           ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               age |   .0773019   .0101936     7.58   0.017     .0334424    .1211613
                   |
       c.age#c.age |  -.0045583   .0021586    -2.11   0.169    -.0138461    .0047294
                   |
              year |
               69  |   .3367906   .0871839     3.86   0.061    -.0383313    .7119126
               70  |   .2089384   .2733588     0.76   0.525    -.9672295    1.385106
               71  |   .3144116   .1543689     2.04   0.179    -.3497843    .9786076
               72  |   .5888124   .4728115     1.25   0.339    -1.445531    2.623156
               73  |   .8912873   .4976548     1.79   0.215    -1.249948    3.032523
               75  |   1.246958   .5791178     2.15   0.164    -1.244785    3.738701
               77  |   1.560689   .8225333     1.90   0.198    -1.978387    5.099764
               78  |   1.941522   1.218922     1.59   0.252    -3.303077    7.186121
               80  |    2.34498   1.454951     1.61   0.248    -3.915167    8.605128
               82  |   2.698954   1.585626     1.70   0.231    -4.123442     9.52135
               83  |   2.994437   1.730077     1.73   0.226    -4.449484    10.43836
               85  |   3.538578   2.107946     1.68   0.235    -5.531183    12.60834
               87  |   3.965153      2.346     1.69   0.233     -6.12887    14.05918
               88  |    4.40786   2.563793     1.72   0.228    -6.623251    15.43897
                   |
             _cons |   1.465543   .3990418     3.67   0.067    -.2513952    3.182481
      -------------+----------------------------------------------------------------
           sigma_u |  .54258328
           sigma_e |  .21942548
               rho |  .85944136   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      .
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thanks for the very detailed responses!

        What if my data is not panel? My individual-level data is a repeated cross-section and I want to add county and year fixed effects. How does this change the xtset?

        Comment


        • #5
          Samyan:
          if you have repeated cross-sectional data, you should go -regress-:
          Code:
          reg y x i.id i.year
          If necessary, standard errors can be clustered on some relevant variable (say, country).
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment

          Working...
          X