Clustered standard errors with group-specific slope parameters

Bernhard Noebauer

Join Date: Feb 2022

Posts: 5
#1

Clustered standard errors with group-specific slope parameters

07 Feb 2022, 09:55

Hello,

I am trying to estimate a regression with 746 city fixed-effects and a distinct slope-parameter for each city. I would like to cluster the standard errors by city. I am using Stata 16.

The results seem fine when I use no option or the robust option to compute standard errors. When I try to cluster them by city, the standard errors get extremely small, to an extent that makes me suspect that something must be incorrect.

I was able to recreate the same behavior with sample data.

Code:

sysuse auto, clear reg price c.weight#i.foreign i.foreign, robust *This looks reasonable reg price c.weight#i.foreign i.foreign, vce(cluster foreign) *This produces incredibly small standard errors xtset foreign xtreg price c.weight#i.foreign, fe vce(robust) *This returns the same slope coefficients as above, but does not display standard errors at all

While the sample data has only two groups, my actual data shows the same behavior with 746 groups. Restricting the sample to cities with at least 1000 observations does not change anything either, so it does not appear to be driven by a small number of observations.

What am I doing wrong?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

07 Feb 2022, 11:32

Bernhard:
welcome to this forum.
the way you coded interaction is not correct, as it shoud be (I'm taking -regress- as an example):

Code:

sysuse auto, clear reg price c.weight c.weight#i.foreign i.foreign, robust

or, in a more efficient way:

Code:

sysuse auto, clear reg price c.weight ##i.foreign, robust

Double-check whether or not this fix makes your results more resonable (whatever it may mean).

In addition:
1) under -regress- the -robust- option takes heteroskedasticity only into account. To take serial correlation into account, go -vce(cluster clusterid)-;
2) conversely, -robust- and -vce(cluster clusterid)- do the very same job under -xtreg-, as they both call the cluster-robust standard error.

Kind regards,
Carlo
(Stata 19.0)
Comment

Bernhard Noebauer

Join Date: Feb 2022
Posts: 5

08 Feb 2022, 14:13

Hello Carlo,

Thank you very much for the answer and the welcome!

I am trying to understand why the explicit inclusion of the base-effect of c.weight is necessary. For the examples I tried, I get equivalent results.

reg price c.weight#i.foreign i.foreign	reg price c.weight c.weight#i.foreign i.foreign	Comparison
_b[0b.foreign#c.weight]	_b[weight]	same coefficient, same standard error
_b[1.foreign#c.weight]	_b[weight] + _b[1.foreign#c.weight]	same coefficient, can compute same standard error using lincom

In the examples I tried, this is true regardless of using standard or heteroskedasticity robust standard errors. It also holds if I include controls and if the categorical variable has more than two categories. All coefficients and standard errors not depicted in the table (constant, fixed effects) are also the same. This is something rather fundamental about regressions in Stata, so I would be very happy to understand when this equivalence breaks down / why my version is not correct.

Interestingly, the standard errors do indeed somewhat change with the explicit inclusion of the base effect when I cluster them using vce(cluster foreign). (They stay the same if I include an additional control). However, my issue stays qualitatively the same. To give one example, when I run

Code:

sysuse auto, clear

reg price c.weight c.weight#i.foreign i.foreign

with different options for standard errors and look at the t-value of weight, it is 7.19 with no option, 5.52 with the robust option and 1.5e+15 with the vce(cluster foreign) option. This makes me suspect that there must be something wrong with my clustering.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#4

08 Feb 2022, 22:26

You can’t cluster to obtain standard errors for the fixed effects, whether they’re intercepts or slopes. Clustering only works for parameters assumed constant across city.
Comment
Bernhard Noebauer

Join Date: Feb 2022

Posts: 5
#5

09 Feb 2022, 14:47

Thank you very much! That is very good to know and explains my confusion.

Can I bootstrap the standard errors using vce(bootstrap)? I tried and the average bootstrapped standard error is slightly smaller than the average robust standard error in my example, while being of the same magnitude. I am unsure whether this is due to my particular sample, or whether standard bootstrapping is also not a good (conservative) choice in a case with city specific slopes and intercepts.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#6

09 Feb 2022, 21:54

Unless you are specifying a cluster structure when you bootstrap, you are just obtaining heteroskedasticity-robust standard errors -- which is why they produce something close to vce(robust). You're basically estimating a different equation for each city, right? Then all you can do is use heteroskedasticity-robust standard errors for each city. From you description, it seems you don't have panel data but, perhaps, people living within cities? It would be like clustering with one cluster. I think reg lets you do it because it doesn't recognize the degeneracy. xtreg does, and that's why your standard errors are missing.
1 like
Comment
Bernhard Noebauer

Join Date: Feb 2022

Posts: 5
#7

11 Feb 2022, 00:09

Thanks again, this is extremely helpful! Exactly. I have rental objects in different cities and I am interested in estimating a slope parameter (distance to the city center) and an intercept for each individual city. No time variation. Right now I am estimating them all with one regression. I thought of one regression vs. several regressions as a question of whether I expect the control variables to have a common effect across cities, or quite different effects in different cities. Moreover, it allows me to easily compare the intercepts. But you are right, it is very close to estimating one equation per city.
Comment

Announcement