Clustering Panel Data / Ordered Logit

Andreas Baltin

Join Date: Apr 2019

Posts: 36
#1

Clustering Panel Data / Ordered Logit

29 Apr 2019, 03:13

Dear Statalisters!

I am currently dealing with a panel dataset from China where I use Fixed Effects to estimate some regressands (if they are continuous) and Ordered Logit if they are ordinal.
However, I am unable to wrap my head around clustering.

I am trying to examine the effect of various regressors on individual spending. I have an abundance of variables for each individual for year 2008 and 2011. The individuals were interviewed in their cities, i.e. the researchers took 100 cities in China and surveyed people there.

I now basically do the following:

Code:

xtset ID year xtreg y x1 x2 x3 x4 x5, fe xtlogit y x1 x2 x3 x4 x5

Now I am thinking about clustering. As far as I understood clustering, I should cluster at the city level, so this would become:

Code:

xtreg y x1 x2 x3 x4 x5, fe vce(cluster city) xtlogit y x1 x2 x3 x4 x5, vce(cluster city)

However, if I do this I get an error that clusters are not nested within dataset, this is because some people have moved cities between the survey rounds. This means I am unable to cluster at city level, however, what I could do is remove everyone who has moved between the two survey rounds and then cluster at the city level. However, I do not think this is statistically right, although it only removes 1% of the observations.

Someone else told me that I should rather cluster at the individual level, and include the different cities as dummies into the equation, so it would become

Code:

xtreg y x1 x2 x3 x4 x5 i.city, fe vce(cluster ID) xtlogit y x1 x2 x3 x4 x5 i.city, vce(cluster ID)

However, I do not understand the rationale behind clustering at the individual level and including a dummy for cities?

Is anyone able to help me a bit on this?

Many thanks in advance!
Andreas

Last edited by Andreas Baltin; 29 Apr 2019, 03:16.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17671
#2

29 Apr 2019, 03:25

Andreas:
set aside the apparent issue that -xtreg- and -xtlogit- are not two faces of the same coin and, as such, are not interchangeable, when you cluster on a given variable (I guess for autocorrelation, in yiur case) your belief is that each added piece of information about the same panel (measure at time 2) is not independent from the previous one (measure at the time 1). Your -panelid- are individuals, not cities; hence, cluster on -panelid- and add a categorical predictor -i.city- to capture potential contribution to this independent variable to variation in the regressand (when adjusted for the remaining regressors).

Kind regards,
Carlo
(StataNow 18.5)
Comment
Andreas Baltin

Join Date: Apr 2019

Posts: 36
#3

29 Apr 2019, 03:37

Originally posted by Carlo Lazzaro View Post

Andreas:
set aside the apparent issue that -xtreg- and -xtlogit- are not two faces of the same coin and, as such, are not interchangeable, when you cluster on a given variable (I guess for autocorrelation, in yiur case) your belief is that each added piece of information about the same panel (measure at time 2) is not independent from the previous one (measure at the time 1). Your -panelid- are individuals, not cities; hence, cluster on -panelid- and add a categorical predictor -i.city- to capture potential contribution to this independent variable to variation in the regressand (when adjusted for the remaining regressors).

Hi Carlo,

I am walking to the library in a minute to get an article about clustering, but wanted to reply first:

So first off all, essentially what you say is that I should use this right?:

Code:

xtreg y x1 x2 x3 x4 x5 i.city, fe vce(cluster ID) xtlogit y x1 x2 x3 x4 x5 i.city, vce(cluster ID)

I just do not understand the rationale behind this. I know you tried to explain it above but I do not FULLY understand it, and I really want to. Would you be able to elaborate a bit on why I am clustering at the individual level and not the state level? Or could you point me towards a paper (not too technical) that explains it? I also think it is very important to say that I am also using some city-specific variables in my regression, which values will be the same for every individual in the same city. Surely I should cluster at the city level then?

I am really sorry for stealing so much time from you, but I really want to understand this.

Many thanks!
Andreas

Last edited by Andreas Baltin; 29 Apr 2019, 03:41.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17671
#4

29 Apr 2019, 04:03

Andreas:
you should consider studying any decent econometrics textbook: see excellent David Benson reply at https://www.statalist.org/forums/for...tandard-errors (that includes a valuable reference).
As an aside, please note that the -fe- machinery wipes out time-invariant predictors.

Kind regards,
Carlo
(StataNow 18.5)
Comment

Announcement

Clustering Panel Data / Ordered Logit

Comment

Comment

Comment