cluster at what level

Marry Lee

Join Date: Nov 2020

Posts: 189
#1

cluster at what level

26 Feb 2024, 05:34

Dear all,

I read some posts here about clustering and what I understood is that we need to consider correlations between errors when deciding on that.

I read a paper studying education outcomes of students in different grades over time. So, their data is constructed such as students are observed over the years when they pass from one grade to another and these students are enrolled in different schools.

Their model is as follows:

Code:

y_i,g,s,t = beta1 X_g,s,t + beta2 Z_i,t + lamda_s,t + mu_g + epsilon_i,g,s,t

where i : student id, g: grade, s: school, and t: year.
lamda_s,t : schoolXyear fixed effects
mu_g : grade fixed effects

They cluster standard errors at the schoolXgradeXyear level. However, I am not convinced with this clustering levels. One may argue that errors are correlated for a given student over years and over grades, no? so the only clustering possible is at the school level, right? Am I missing something.

Please let me know what you think.
All the best
Tags: None
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

27 Feb 2024, 01:26

How many schools are there?

By school, is the number of observations similar across schools?

Have you tried using the user-written boottest command, with a wild cluster restricted bootstrap?
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#3

27 Feb 2024, 01:27

You could also try the user written summclust command
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#4

27 Feb 2024, 06:31

One may argue that errors are correlated for a given student over years and over grades, no?

If you believe that there are meaningful differences in students that follow them across time, then I agree. Are there multiple observations per grade? If not, you could treat grade and year as interchangeable. But if you do have multiple observations of a student within a grade, then you need to model that appropriately.

Do you observe the same student(s) in different schools? If that is the case, then in my fields of education and psychology, people employ cross-classified random effects models for data such as this. Assuming one measurement per grade, the resulting command would look something like the following:

Code:

mixed score c.grade || _all:R.school || student: grade, cov(unstructured)

This is sometimes called a growth model in the literature. Each student has their own intercept and rate of change in the outcome over time. You could add higher-order terms for grade to allow for more flexible representations of change. The model also includes random intercepts for schools with the _all:R.school telling mixed to treat schools and students as crossed. You could additionally add robust standard errors to this model if you wanted to.
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#5

28 Feb 2024, 05:37

Maxence Morlet Thanks for your comment but that do not answer my question. I just want to understand the reasoning behind clustering at the schoolXgradeXyear level.
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#6

28 Feb 2024, 05:40

Erik Ruzek Thanks for you answer.

Students are believed to stay within the same school over the grades.

Yes there are multiple students per grade, observed over the period of 1990 and 2020 (they should stay in school for 3 years, we have 3 grades).
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#7

28 Feb 2024, 07:42

Marry Lee Thanks for the clarification. My question was whether individual students contribute multiple score outcomes per grade or is it that each student only has a single score observation per grade level?
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#8

28 Feb 2024, 09:02

Erik Ruzek, only one observation per grade is observed (few students who repeated the grade may exist though, but I think this is ok since the same outcome is observed in 2 different years).
Comment

Announcement

cluster at what level

Comment

Comment

Comment

Comment

Comment

Comment

Comment