Using fixed effects in cross sectional data

Klaus Klausen

Join Date: Mar 2021

Posts: 72
#1

Using fixed effects in cross sectional data

14 Mar 2022, 12:18

Hello,

there are a few topics on applying fe on cross sectional data but it seems I am not able to rebuild fe using cross sectional data:

I have a panel dataset for which I only set the panel ID whereas the entity is a firm:

Code:

xtset id

If I would run a fe model using firm and year fixed effects and cluster the se on the firm level I would run the following code:

Code:

xtreg DV IV i.Year, fe vce(cluster id)

Shouldn't this be equivalent to the following code?

Code:

reg DV IV i.Year i.id, vce(cluster id)

I get different results running both codes and I just couldn't wrap my mind around why this is the case. Is there something wrong with the code or does this seem to be a data issue?

Thank you
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

14 Mar 2022, 12:32

Klaus:
no issue indeed (and no example/Stata outcome tables from your side, too

).
That said, with the two approach you get the very same value for the sample estimate of the shared coefficients (both linear and squared terms for -age- in the following toy-example):

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age if idcode<=3, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =         39
Group variable: idcode                          Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.6382                                         min =         12
     Between = 0.8744                                         avg =       13.0
     Overall = 0.2765                                         max =         15

                                                F(2,2)            =       3.83
corr(u_i, Xb) = -0.2473                         Prob > F          =     0.2070

                                 (Std. err. adjusted for 3 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .2512762   .1007559     2.49   0.130    -.1822416     .684794
             |
 c.age#c.age |  -.0037603   .0015163    -2.48   0.131    -.0102844    .0027638
             |
       _cons |  -2.189815   1.575348    -1.39   0.299    -8.967992    4.588361
-------------+----------------------------------------------------------------
     sigma_u |  .31366066
     sigma_e |  .19867104
         rho |  .71367959   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. reg ln_wage c.age##c.age i.idcode if idcode<=3, vce(cluster idcode)

Linear regression                               Number of obs     =         39
                                                F(1, 2)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.7407
                                                Root MSE          =     .19867

                                 (Std. err. adjusted for 3 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .2512762    .103677     2.42   0.136    -.1948099    .6973623
             |
 c.age#c.age |  -.0037603   .0015603    -2.41   0.138    -.0104736     .002953
             |
      idcode |
          2  |  -.4231615   .0288023   -14.69   0.005    -.5470877   -.2992353
          3  |  -.6126416   .0625166    -9.80   0.010    -.8816288   -.3436544
             |
       _cons |   -1.82398   1.588179    -1.15   0.370    -8.657361      5.0094
------------------------------------------------------------------------------

.

Standard errors and related stuff differ because -xtreg,fe- focuses on within panel variation only.
The constant, in -xtreg,fe-, is the mean of the fixed effects and has no relationship with the one -regress- gives back.
Lastly, in your second code you're still in the panel data econometrics realm (and leave the cross-sectional one behind you).

Last edited by Carlo Lazzaro; 14 Mar 2022, 12:35.

Kind regards,
Carlo
(Stata 19.0)

Comment

Klaus Klausen

Join Date: Mar 2021

Posts: 72
#3

16 Mar 2022, 05:31

Thank you very much, please excuse my delayed reply and the missing data example.
Even though the coefficients of both approaches are the same, the t-statistic is different and may lead to different significance results right? Is there any general recommendation which approach should be used then?

Lastly, in your second code you're still in the panel data econometrics realm (and leave the cross-sectional one behind you).

Is this because I apply the year and firm fixed effects manually here?

As long as I do not use timeseries commands it doesn't whether I set a time variable using -xtset right?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

16 Mar 2022, 05:44

Klaus:
1) standard errors and related stuff differ because, unlike -regress-, -xtreg,fe- focuses on within panel variation only;
2) in your -regress- code you are also running a panel data regerssion with fixed effect;
3) -xtset- ting your dataset with your -panelid- only makes sense if you receive the -repeated time values- error message from Stata.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Klaus Klausen

Join Date: Mar 2021

Posts: 72
#5

16 Mar 2022, 11:01

Hi Carlo,
I would have another, somehow related question, to that:

How can I cluster my se on firm level, when I use xtreg, fe and want to apply time and industry fixed effects?

As I set the Firm ID as the identifier, xtreg, fe will use firm rather than industry fixed effects.

Code:

xtreg DV IV i.Year, fe vce(cluster Industry)

Will return

Code:

panels are not nested within clusters r(498);

I could avoid this by setting the industry ID as panel ID but intuitively, that doesn't make much sense to me (could be wrong though).

Is the only way to do this using -reg?

Code:

reg DV IV i.Industry i.Time, vce(cluster ID)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#6

16 Mar 2022, 11:10

Klaus:
as you're using an -fe- specification, if -undustry- does not change within panel (as it is always the case), there's no gain in including it as a predictor.
Hence you can safely go:

Code:

xtreg DV IV i.Year, fe vce(cluster ID)

Kind regards,
Carlo
(Stata 19.0)
Comment
Klaus Klausen

Join Date: Mar 2021

Posts: 72
#7

16 Mar 2022, 13:24

Thank you. Do you have an idea why nearly all papers in my research area explicitly mention that they control for industry and not for firm fixed effects? It makes perfect sense to me that firm fixed effects are used to control for industry plus other factors. I'm just very confused by the literature, since it mostly talks about industry and not firm fixed effect.
For example:

We also include industry and year fixed effects to control for within-industry variations [...]
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#8

17 Mar 2022, 01:36

Klaus:
if -industr- dos not change within panels and authors use the -fe- specification, there's no way to obtain -industry- coefficient, as subtracting the mean of a time-invariant predictor from the invariant predictor itself gives zero (this is the dark side of demeaning).
The only way to estimate a time-invariant variable coefficient is to switch to -re- or Mundlak correction (please see the community-contributed module -mundlak).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement