Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • summclust: new command for assessing and improving cluster-robust inference

    Hello Stata Forum,

    James MacKinnon, Morten Nielsen, and I are excited to announce our Stata package summclust. It is designed to assess the reliability of conventional cluster-robust inference, and it also calculates improved standard errors. For a much more detailed explanation, see the working paper https://arxiv.org/abs/2205.03288

    summclust calculates many statistics to assess whether standard cluster-robust inference is reliable. It also calculates cluster robust jackknife standard errors which offer more reliable inferences.

    Conventional cluster-robust inference works best when there are many clusters and these clusters are homogeneous. This means that they contain both similar numbers of observations and similar amounts of information. summclust reports summary statistics about both.

    To assess the information contained within each cluster, summclust calculates cluster-level influence, leverage and partial leverage. Summary measures of these statistics are reported by default, and additional measures or the full set are optionally available.

    Ideally, all clusters will have similar measures of leverage, partial leverage, and influence. When a few clusters are found to have large values for these statistics, conventional cluster-robust inference can be unreliable.

    The "effective number of clusters" by Carter, Schnepel, and Steigerwald is also available. Conventional cluster-robust inference is also often unreliable when this value, G*, differs significantly from the actual number of clusters.

    summclust offers two new cluster-robust jackknife standard errors, which we call CV3 and CV3J. These standard errors were proposed nearly 20 years ago but are seldom used because the original versions were computationally slow and often infeasible for large samples.

    Our working paper https://arxiv.org/abs/2205.03288 explains how we can calculate these standard errors quickly as a byproduct of calculating the cluster influence measures. Another working paper https://ideas.repec.org/p/qed/wpaper/1485.html demonstrates their improved properties.

    When any of the measures of leverage, influence, or G* suggest that conventional cluster-robust inference may be unreliable, we recommend using either CV3(J) standard errors, the wild cluster bootstrap, or both.

    For a detailed discussion of how to make reliable inferences in linear regression models with clustering , see "Cluster-Robust Inference: A Guide to Empirical Practice" https://arxiv.org/abs/2205.03285 which was recently accepted
    for publication at Journal of Econometrics.

    summclust is available from SSC and from https://github.com/mattdwebb/summclust.

    We hope you give it a try and welcome any feedback that you may have.

  • #2
    Dear Professor Webb,

    Thanks for this great package!

    I am trying to apply it to panel data, with a classical two-way fixed effects model: individual and month FE.

    Please could you granularly explain the difference between fevar() and xvar()? I get the point of absorb, and why we distinguish this option from fevar() and xvar(), however I am unsure concerning the difference between the two latter options.

    I do have another question, supposing I want to cluster by individual:

    Control variables aside, and regressor / regressand aside, in order to include two-way fixed effects, would I have to include

    Code:
    absorb(individual time)
    to account for the two-way fixed effects?

    Many thanks! (A few follow-up question may arise )
    Maxence

    Comment


    • #3
      Very interesting package.

      I'm just trying it now, but I keep getting "file summclust_temp_ln.gph could not be opened" r(603).
      I'm using Stata/MP 17

      Comment

      Working...
      X