Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cluster at what level

    Dear all,

    I read some posts here about clustering and what I understood is that we need to consider correlations between errors when deciding on that.


    I read a paper studying education outcomes of students in different grades over time. So, their data is constructed such as students are observed over the years when they pass from one grade to another and these students are enrolled in different schools.


    Their model is as follows:

    Code:
    y_i,g,s,t = beta1 X_g,s,t  + beta2 Z_i,t + lamda_s,t + mu_g + epsilon_i,g,s,t
    where i : student id, g: grade, s: school, and t: year.
    lamda_s,t : schoolXyear fixed effects
    mu_g : grade fixed effects

    They cluster standard errors at the schoolXgradeXyear level. However, I am not convinced with this clustering levels. One may argue that errors are correlated for a given student over years and over grades, no? so the only clustering possible is at the school level, right? Am I missing something.

    Please let me know what you think.
    All the best




  • #2
    How many schools are there?

    By school, is the number of observations similar across schools?

    Have you tried using the user-written boottest command, with a wild cluster restricted bootstrap?

    Comment


    • #3
      You could also try the user written summclust command

      Comment


      • #4
        One may argue that errors are correlated for a given student over years and over grades, no?
        If you believe that there are meaningful differences in students that follow them across time, then I agree. Are there multiple observations per grade? If not, you could treat grade and year as interchangeable. But if you do have multiple observations of a student within a grade, then you need to model that appropriately.

        Do you observe the same student(s) in different schools? If that is the case, then in my fields of education and psychology, people employ cross-classified random effects models for data such as this. Assuming one measurement per grade, the resulting command would look something like the following:
        Code:
        mixed score c.grade || _all:R.school || student: grade, cov(unstructured)
        This is sometimes called a growth model in the literature. Each student has their own intercept and rate of change in the outcome over time. You could add higher-order terms for grade to allow for more flexible representations of change. The model also includes random intercepts for schools with the _all:R.school telling mixed to treat schools and students as crossed. You could additionally add robust standard errors to this model if you wanted to.

        Comment


        • #5
          Maxence Morlet Thanks for your comment but that do not answer my question. I just want to understand the reasoning behind clustering at the schoolXgradeXyear level.

          Comment


          • #6
            Erik Ruzek Thanks for you answer.

            Students are believed to stay within the same school over the grades.

            Yes there are multiple students per grade, observed over the period of 1990 and 2020 (they should stay in school for 3 years, we have 3 grades).

            Comment


            • #7
              Marry Lee Thanks for the clarification. My question was whether individual students contribute multiple score outcomes per grade or is it that each student only has a single score observation per grade level?

              Comment


              • #8
                Erik Ruzek, only one observation per grade is observed (few students who repeated the grade may exist though, but I think this is ok since the same outcome is observed in 2 different years).

                Comment

                Working...
                X