Fixed Effects and Clustering the Standard Error

Jenny Seinsch

Join Date: Sep 2017

Posts: 3
#1

Fixed Effects and Clustering the Standard Error

05 Sep 2017, 08:45

Hello,

I need your help!

I try to understand the meaning of fixed effects and clustering standard errors.

I have a regression with a dependent variable and independent variables: reg x y y y y y y i.date i.country, why do I need these fixed effects?
And then my professor said I have to cluster by country and year, but I do not understand the sense of clustering the standard error by country and year!

I hope somebody can help me!
Thanks in Advance!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#2

05 Sep 2017, 09:00

Jenny:
welcome to the list.
If you're dealing with panel data and you use -regress- (which is usually not your best bet with such data structure; see -xtreg-, instead), you should -cluster- your standard errors on your -panelid- (probably -country-in your case), as your observations are not independent.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
River Huang

Join Date: Mar 2016

Posts: 1906
#3

05 Sep 2017, 18:26

You can get some ideas about clustering by looking at Figures 1 (at firm level) and 6 (at both firm and year level) of the following paper: https://academic.oup.com/rfs/article...nce-Panel-Data.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#4

05 Sep 2017, 19:00

It's mostly standard to just cluster by your panelid, in this case country. However I have heard people are now clustering by panelid and timevar, in this case country and year. I personally have never done it and have never seen an explanation on how to do it.

To run a two way fixed effect model with clustered errors, in your case would be.

Code:

xtset country year xtreg y x i.year, fe vce(cluster country)
1 like
Comment
River Huang

Join Date: Mar 2016

Posts: 1906
#5

05 Sep 2017, 19:43

So far as I know,

Code:

xtreg y x i.year, fe vce(robust) xtreg y x i.year, fe vce(cluster country)

provide identical standard (robust) errors.

Last edited by River Huang; 05 Sep 2017, 19:45.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
1 like
Comment
River Huang

Join Date: Mar 2016

Posts: 1906
#6

05 Sep 2017, 19:47

To cluster at both firm (id) and year (year) level, see below for an example: (please first ssc install reghdfe):

Code:

reghdfe y x1x2, a(id year) vce(cluster id year)

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
2 likes
Comment
Jenny Seinsch

Join Date: Sep 2017

Posts: 3
#7

05 Sep 2017, 23:54

Thanks for help and the quick responses!

Can I also use the function xtivreg2 or ivreg2 y x1 x2 x3 ... i.year i.country, cluster (country date) ?

Last edited by Jenny Seinsch; 06 Sep 2017, 00:05.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#8

06 Sep 2017, 00:24

Jenny:
I do not think that -xtivreg2- can accomodate two-way clustering, as you can see from te following toy-example:

Code:

. use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta (Layard & Nickell, Unemployment in Britain, Economica 53, 1986 from Ox dist) . tsset id year panel variable: id (unbalanced) time variable: year, 1976 to 1984 delta: 1 unit . xtivreg2 ys k (n=l2.n l3.n), fe small cluster( ind yr1980) cluster(): too many variables specified r(103);

However, the main point rests on the fact that you're seemingly considering -ivreg2-equivalent to -xtivreg2-.
Let's set aside for a while instrumental variables.
Your original post focused on two-way clustered standard errors using -regress- in dealing with panel data (which should not be your first choice, since -xtreg- usually outperforms -regress- when it comes to panel data).
If you're intended to use -regress- with panel data, clustering your standard errors on -panelid- is mandatory. Conversely, Stata would not be informed that you're analysing non-independent observations.
Things are different with -xtreg-, as you are requested to -xtset- your data first: hence, Stata knows from the start that you're dealing with panel data.
Hence, clustering/robustifying standard errors with -xtreg- is not manadtory: it makes sense if you suspect heteroskedasticity/autocorrelation in your data (by the way, the latter is quite immaterial as long as you're dealing with a large N, small T panel dataset, as it frequently appears to be the case on this forum); otherwise, default standard errors are enough..

Kind regards,
Carlo
(Stata 19.0)
Comment
Jenny Seinsch

Join Date: Sep 2017

Posts: 3
#9

06 Sep 2017, 00:36

Hi Carlo,
My problem is that if I do it like:
cluster2 y x x x x, fcluster(country) tcluster(date)

reghdfe y x x x, a (country date) vce (cluster country date)

xtreg y x x x x, i.date, fe vce (cluster country)

I do not have a lot of statistically significant explanatory variables, but I talk about them in my work.
But I am completely new in Stata to understand all the staff. For example if I use only fixed effects without clustering I have good models with a lot of statistically significant variables, therefore I had the question: What is the sense of clustering? Do I really need it?

Thanks a lot!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#10

06 Sep 2017, 00:50

Jenny:
some remarks about your last query:
- statistical significance is usually oversold (this opinion is pretty shared on this list). You should better look at the 95% confidence interval of coefficients, instead.
Besides, the lack of ststistical significance can depend on different issues, the most trivial being a limited sample size;
- clustering is mandatory when you have panel data and you (often wrongly) decide to analyze them via a regression model conceived for one-wave dataset (say, -regresss-; -logit-; -poisson). Conversely, Stata would not be informed that you're analysing non-independent observations;
- if you're dealing with panel data and (as it is often correct) you use one of the -xt- suite models (say, -xtreg-; -xtlogit-; -xtpoisson-), clustering (if feasible) should be considered when you suspect heteroskedasticity and/or autocorrelation in your data.

I would recommend you to discuss these topics with you supervisor before starting your statistical analysis, just to avoid wasting time and panic crises as the deadline gets nearer.

Kind regards,
Carlo
(Stata 19.0)
3 likes
Comment
Daniel van Loenen

Join Date: Oct 2019

Posts: 5
#11

14 Oct 2019, 09:19

Originally posted by River Huang View Post

To cluster at both firm (id) and year (year) level, see below for an example: (please first ssc install reghdfe):

Code:

reghdfe y x1x2, a(id year) vce(cluster id year)

Dear Carlo and River,

Your posts in this section are really helpful. I still have a question about the implementation in stata.
I was trying to incorporate two types of fixed effects and clustering on both dimensions. When implementing this in stata, an error message popped up (see attachment).
Do you know how to resolve this?

FYI, the panelid is industries (industrynr is the actual variable) and timevar is Observation (which equals 1 to 242 for each industry. I did it this way as there was a year and monthly component so I decomposed it into an observation for each industry which seems to work)

Thank you in advance.

Regards,

Daniel
Attached Files
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#12

14 Oct 2019, 10:33

Daniel:
you might be interested in the folowing thread: https://www.statalist.org/forums/for...ects-undefined.

Kind regards,
Carlo
(Stata 19.0)
Comment
Daniel van Loenen

Join Date: Oct 2019

Posts: 5
#13

16 Oct 2019, 01:40

Dear Carlo,

Thank you for your help. I didn't see that thread before because I am new to this forum.
The problem is resolved. Appreciate it!

Kind regards,

Daniel
Comment
Mohammad Al-Tamimi

Join Date: Jun 2022

Posts: 3
#14

19 Jul 2022, 21:02

Dear Carlo
I have a question please

Can I ask please, If I use
xtreg y l.y x1 x2.....x15 i.year i.country i.industry, robust cl(comany_id)

(((5 years unbalanced 379 observations )))
Does it mean that I am using mixed effect plus random effect together?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#15

20 Jul 2022, 00:08

Mohammad:
not quite.
I'd say that you're coding something similar to dynamic panel data analysis (the lagged regressand is plugged in as a predictor).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Fixed Effects and Clustering the Standard Error

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment