Panels are not nested within clusters

Andreas Baltin

Join Date: Apr 2019

Posts: 36
#1

Panels are not nested within clusters

23 Apr 2019, 12:53

Dear Statalister,

I basically encounter a problem similar to this: https://www.statalist.org/forums/for...try-year-level

However, the solution does not work for me.

Basically, I want to do the following.

xtset ID year
xtreg y x1 x2 x3 x4, fe vce(cluster state)

However I always get 'panels are not nested within clusters', even if I follow the advice in the other thread.

Any insights on what I could do?

Many thanks in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#2

23 Apr 2019, 13:22

The message is self-explanatory. Your panels (IDs) are not nested within the clusters (states), which makes this an inadmissible command. So somewhere in your data there is at least one ID that appears in more than one state. There might be many like that.

If your IDs represent people or firms, and this is panel data, people and firms do move around. So this isn't really very surprising. But if the nature of the panel data's construction is such that it should only include non-moving entities, then that means there is an error in your data. You can find the offending ID's as follows:

Code:

by ID (state), sort: gen byte moved = (state[1] != state[_N]) browse if moved
6 likes
Comment
Andreas Baltin

Join Date: Apr 2019

Posts: 36
#3

23 Apr 2019, 13:26

Originally posted by Clyde Schechter View Post

The message is self-explanatory. Your panels (IDs) are not nested within the clusters (states), which makes this an inadmissible command. So somewhere in your data there is at least one ID that appears in more than one state. There might be many like that.

If your IDs represent people or firms, and this is panel data, people and firms do move around. So this isn't really very surprising. But if the nature of the panel data's construction is such that it should only include non-moving entities, then that means there is an error in your data. You can find the offending ID's as follows:

Code:

by ID (state), sort: gen byte moved = (state[1] != state[_N]) browse if moved

Hi Clyde, you were right. The thing was indeed that some individuals moved states between survey rounds. I have fixed this, and it works now! Many thanks!
Comment
Nirmol Chandra Das

Join Date: Apr 2020

Posts: 3
#4

22 Apr 2020, 00:44

Hi,
I am having the same problem. I want to do a time clustering of standard error. And data set is unbalanced panel.
I have x1(id), x2(date) along with other variables.

And my id variable also moves across different times. Therefore, STATA is showing the panels are not nested within clusters. So, I want to ask if it is still possible to have time clustered estimate. If it's possible then can you please guide me on this?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#5

22 Apr 2020, 01:07

Nirmol:
welcome to this forum.
Usually, standard errors are clustered on -panelid-, not on -timevar-.
What's the reason why you are interested in that unusual way of clusering? Do you mean you have autocorrelation in the same panela and/or across panel correlation?

Kind regards,
Carlo
(Stata 19.0)
Comment
Claire Essy

Join Date: May 2019

Posts: 18
#6

15 Aug 2020, 04:33

Hi Carlo,
I would like to ask you a question on this last post of you.
You say that standard errors should be clustered on panelid and not on timevar. I totally agree to this if we are talking about a pooled panel regressions, since in that case Stata does not "know" it is dealing with the same observations repeated across time and it interprets the n obs repeated t times as a total of n*t indipendent obs.
Anyway, if I am dealing with a fixed effect regression, I should not anymore be worried of that, right? the fe SE are computed to take into account temporal correlation, as far as I know. In a fixed effect regression makes more sense to cluster by some group identifier, as state in the case of the first post in this discussion, right?.
Then, I also do not see why someone should have a situation for which it makes sense to cluster at the time level, but I also do not know the regression model of Nirmol.
Thank you in advance!
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#7

15 Aug 2020, 04:50

If you want to cluster on something that is not nested (and you think you know what you are doing), there is an undocumented option -xtreg, nonest- which will force Stata to calculate the clustered variance post FE and RE even if non-nested.

As to Claire Essy 's question, no, including fixed effects does not necessarily take care of the within panel correlation, the current agreement in the literature is that post fixed effects it is wise to calculate clustered variance.
Comment
Claire Essy

Join Date: May 2019

Posts: 18
#8

15 Aug 2020, 11:00

Hi Joro, thank you for your reply!
Just to be sure: what you say can be referred also to the fixed effect model? Because when I said fixed effect regression I did not mean just to insert a fixed effect (which you can also do in a panel regression), but running an xtreg..., fe .
Thanks!
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#9

15 Aug 2020, 15:19

Yes Claire. In the panel data model

Yit = b*Xit + Ei + Vit

that we typically fit with -xtreg, fe- or -xtreg, re-, including the fixed or random effects resolves the problem of correlation in the composite error (Ei + Vit) only if the idiosyncratic error Vit is uncorrelated across t. If it is correlated, the fact that we have included fixed or random effects does not obviate the need to compute clustered variance as in
-xtreg y x, fe cluster(i)-.

Originally posted by Claire Essy View Post

Hi Joro, thank you for your reply!
Just to be sure: what you say can be referred also to the fixed effect model? Because when I said fixed effect regression I did not mean just to insert a fixed effect (which you can also do in a panel regression), but running an xtreg..., fe .
Thanks!
1 like
Comment
Claire Essy

Join Date: May 2019

Posts: 18
#10

19 Aug 2020, 13:36

Thank you very much
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#11

16 Mar 2022, 04:35

Originally posted by Clyde Schechter View Post

The message is self-explanatory. Your panels (IDs) are not nested within the clusters (states), which makes this an inadmissible command. So somewhere in your data there is at least one ID that appears in more than one state. There might be many like that.

If your IDs represent people or firms, and this is panel data, people and firms do move around. So this isn't really very surprising. But if the nature of the panel data's construction is such that it should only include non-moving entities, then that means there is an error in your data. You can find the offending ID's as follows:

Code:

by ID (state), sort: gen byte moved = (state[1] != state[_N]) browse if moved

Clyde Schechter I'm very sorry to bother you with a probably trivial question but I don't really understand in which particular situation one can apply the clustered standard error concept. I have a dataset with two time periods and individuals that were interviewed about their business success. The interviews happened at the location of their stores (which was also documented), ie in several different town centers across a large region in Uganda. Some of these individuals switched their towncenter after the first interview, ie they indicated a different town center compared to the first period. If have reason to believe that there are local market effects affecting the individuals which make me want to cluster the standard errors on the town center level, is it only possible to take the individuals into account that did not switch their town center? Are the remaining individuals dropped from the analysis? Or is it possible to only cluster on the location in period one, thereby keeping all observations?

Of course a hint to relevant literature on this would also be greatly appreciated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#12

16 Mar 2022, 09:44

The web of dependencies in the design you describe is more complicated than that of the familiar panel-data analysis. Your observations are subject to two sources of dependency: repeated observations of the same individual, and also different markets, with some individuals appearing in more than one market. One approach, which is deprecated in the finance and economic sectors but used more freely elsewhere, is to acknowledge the reality that this is not panel data and to build an analysis that reflects the actual nesting structure: a mixed-effects model with a multiple-membership structure of random effects at the individual and location levels. This has the drawback that estimates from such models may not provide consistent estimates, and it is important in this setting to include as many important covariates in the analysis as possible. There is no perfect solution here.

A different approach, if you have a large number of both individuals and locations, is to use Sergio Correa's -reghdfe- (available from SSC). You can specify -absorb(individual location)- and -vce(individual location)-. You might encounter a different problem with this: since you only have two interviews per individual, and some individuals contribute to two different locations, the clustering may soak up a large number of degrees of freedom, so that you will lose the ability to test high-dimensional hypotheses, and your power with low-dimensional hypotheses may be noticeably reduced.

I think both of these solutions are better than removing data: data removal is likely to result in biased samples, and, hence, results that are not generalizable. Data removal should generally be confined to situations where the data are known to be incorrect and there is no feasible way to correct them. That is not your situation.

It is what it is.
2 likes
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#13

17 Mar 2022, 09:31

Thank you very much for your detailed answer. I already stumbled across the reghdfe command and I will try to apply it to my analysis.

I'm still curious why you write this:

One approach, which is deprecated in the finance and economic sectors but used more freely elsewhere, is to acknowledge the reality that this is not panel data

Why can't my data be considered panel data? I have several individuals observed over two time periods. Isn't that by definition a panel data structure?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#14

17 Mar 2022, 09:51

Well, strictly speaking, it's only panel data if the only dependencies among observations arise from the repeated observations over time of the same individuals. But you have additional dependencies because of the location factor. So it's not just simple panel data.
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#15

17 Mar 2022, 11:00

Ok, I see. Thanks for the explanation.
Comment

Announcement

Panels are not nested within clusters

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment