Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster level of the sampling process is different from the cluster level of the treatment assignment.

    Hello everybody,

    I am currently working with a cross-sectional dataset of students in American high schools, which follows the following sampling process: In a first stage, a determined number of schools is randomly sampled from a population of schools (first cluster unit), and in a second stage, a determined number of individuals are randomly sampled within each school. I want to estimate the peer effect on an individual outcome through an Ordinary Least Square regression. The outcome is the number of days per week each individual do physical exercise, and the peer effect is the leave-one-out mean of the number of days per week the grade mates do exercise. That's mean the following: the peer effect variable for individual i, who belongs to school j and grade 9, for example, is made up by his grade mates' average number of days per week doing physical exercise. And here my doubt is raised. Since my treatment (peer effect) is assigned at school-grade level, a different cluster unit than the sampling unit (school), I am afraid that if I cluster my errors at school level, what I should do following standard literature on sampling, they could be too much conservative to estimate the treatment, which varies at school-grade level.

    Having read the recent paper: "When should you adjust standard errors for clustering?", published in The Quarterly Journal of Economics in February, 2023, I came to terms with the idea of clustering at my treatment assignment level (school-grade), but I am still a bit unsure about the right thing to do. I have 120 schools and roughly 360 school-grade groups, and obviously the average number of observations per school-grade cluster is smaller than the average number of observations per school, which affects the asymptotic efficiency of the variance of the residuals, if I not mistaken. I included in my regression school and grade fixed effects to mitigate as much as possible the self-selection problem.

    Does anyone have any advice? Am I misunderstanding something about it?

    Any feedback will be highly appreciated. Thanks a lot in advance.
    Best regards,
    Daniel

  • #2
    After reading and thinking thoroughly about it, I think I have reached a conclussion in this point, and I would like to share it just in case anyone is interested. If I cluster at school level, following the sample design, I would also assuming that the probability for each pupil within school to being exposed to a certain average trait of their peers is the same. This seems unlikely since every grade within school should have differente levels of exposition to those traits, as the identification extrategy exploits the quasi-random cohort variations within school conditional on having controlled for self-selection problem.

    So in this case, the treatment assignment should determine the level of clustering. The assignment probability varies across grades within school, so the level of clustering should be school-grade rather than school, if I am not mistaken. Thank you very much and my apologies if this post was confusing or if it should not have been published here in Statalist.

    Thanks for your consideration.
    Best regards,
    Daniel

    Comment


    • #3
      Daniel:
      posting your question on the General forum of Statalist is totally legal.
      I just add two comments, hoping they are not off-topic:
      a) set aside the clustering for a while, yuo do not mention what method you're going to follow. Is it a -mixed- design, with pupils nested within classes that, in turn, are nested within schools?;
      b) provided that I do not enough about your research, I'm under the impression that you might be interested in something like a "natural experiment" (popular-economicsciencesprize2021-3.pdf (nobelprize.org) with a random sampling as a first stage. If that were the case, I would point you out to the following paper, that in all likelihood is already included in one of your folders: Who Benefits from KIPP? - Angrist - 2012 - Journal of Policy Analysis and Management - Wiley Online Library.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear Mr. Lazzaro,

        Thank you very much for your response. I really appreciate it and I think that your comments are very interesting and not at all off-topic. I provide below my thoughts on them.

        a) I would say that the design is based on sampling schools randomly from a population of schools, and then, sampling pupils randomly within each school. Each pupil is assigned a grade, but they are nested within school initially. By combining the school code and the grade provided, I construct the cohorts within schools.
        b) Yes, it is very related, but I think that it would not be a natural experiment in the sense that some kind of natural random shock affects some schools but not others, or some pupils within school but not others. The empirical estrategy is based on removing the unobserved heterogeneity which might cause selection bias of pupils in each school, that said, for example, if for school x it is more likely to attend pupils whose parents are college educated. A vast array of fixed effects is used to remove as much as possible this source of selection bias. Then, I exploit peers' idiosyncratic variations across cohorts within schools as a source of identification. The idea behind is the following: once the selection bias is addressed, these idiosyncratic variations should be quasi-random. But yes, it is very confussing to me where to draw the line between natural experiments and what I am referring to.

        Again, thank you very much for your comments. I really would be more than happy to discuss them further with you.

        Best regards,
        Daniel

        Comment


        • #5
          Daniel:
          the most substantive, theoretical. condition for discussing further is calling me Carlo !
          That said, the usual advice is to check whether some previous researches on the topic you're interested in are already present in the literature.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            hahaha fair enough, Carlo!

            Thanks for your response. Yes, there is a wide body of literature in this kind of analysis, but still there are some issues that are not entirely clear to me, and I wanted to ask for different advice here.

            Again, thank you very much for your valuable comments. Have a lovely day!

            Comment

            Working...
            X