Hello Stata Forum,
James MacKinnon, Morten Nielsen, and I are excited to announce our Stata package summclust. It is designed to assess the reliability of conventional cluster-robust inference, and it also calculates improved standard errors. For a much more detailed explanation, see the working paper https://arxiv.org/abs/2205.03288
summclust calculates many statistics to assess whether standard cluster-robust inference is reliable. It also calculates cluster robust jackknife standard errors which offer more reliable inferences.
Conventional cluster-robust inference works best when there are many clusters and these clusters are homogeneous. This means that they contain both similar numbers of observations and similar amounts of information. summclust reports summary statistics about both.
To assess the information contained within each cluster, summclust calculates cluster-level influence, leverage and partial leverage. Summary measures of these statistics are reported by default, and additional measures or the full set are optionally available.
Ideally, all clusters will have similar measures of leverage, partial leverage, and influence. When a few clusters are found to have large values for these statistics, conventional cluster-robust inference can be unreliable.
The "effective number of clusters" by Carter, Schnepel, and Steigerwald is also available. Conventional cluster-robust inference is also often unreliable when this value, G*, differs significantly from the actual number of clusters.
summclust offers two new cluster-robust jackknife standard errors, which we call CV3 and CV3J. These standard errors were proposed nearly 20 years ago but are seldom used because the original versions were computationally slow and often infeasible for large samples.
Our working paper https://arxiv.org/abs/2205.03288 explains how we can calculate these standard errors quickly as a byproduct of calculating the cluster influence measures. Another working paper https://ideas.repec.org/p/qed/wpaper/1485.html demonstrates their improved properties.
When any of the measures of leverage, influence, or G* suggest that conventional cluster-robust inference may be unreliable, we recommend using either CV3(J) standard errors, the wild cluster bootstrap, or both.
For a detailed discussion of how to make reliable inferences in linear regression models with clustering , see "Cluster-Robust Inference: A Guide to Empirical Practice" https://arxiv.org/abs/2205.03285 which was recently accepted
for publication at Journal of Econometrics.
summclust is available from SSC and from https://github.com/mattdwebb/summclust.
We hope you give it a try and welcome any feedback that you may have.
James MacKinnon, Morten Nielsen, and I are excited to announce our Stata package summclust. It is designed to assess the reliability of conventional cluster-robust inference, and it also calculates improved standard errors. For a much more detailed explanation, see the working paper https://arxiv.org/abs/2205.03288
summclust calculates many statistics to assess whether standard cluster-robust inference is reliable. It also calculates cluster robust jackknife standard errors which offer more reliable inferences.
Conventional cluster-robust inference works best when there are many clusters and these clusters are homogeneous. This means that they contain both similar numbers of observations and similar amounts of information. summclust reports summary statistics about both.
To assess the information contained within each cluster, summclust calculates cluster-level influence, leverage and partial leverage. Summary measures of these statistics are reported by default, and additional measures or the full set are optionally available.
Ideally, all clusters will have similar measures of leverage, partial leverage, and influence. When a few clusters are found to have large values for these statistics, conventional cluster-robust inference can be unreliable.
The "effective number of clusters" by Carter, Schnepel, and Steigerwald is also available. Conventional cluster-robust inference is also often unreliable when this value, G*, differs significantly from the actual number of clusters.
summclust offers two new cluster-robust jackknife standard errors, which we call CV3 and CV3J. These standard errors were proposed nearly 20 years ago but are seldom used because the original versions were computationally slow and often infeasible for large samples.
Our working paper https://arxiv.org/abs/2205.03288 explains how we can calculate these standard errors quickly as a byproduct of calculating the cluster influence measures. Another working paper https://ideas.repec.org/p/qed/wpaper/1485.html demonstrates their improved properties.
When any of the measures of leverage, influence, or G* suggest that conventional cluster-robust inference may be unreliable, we recommend using either CV3(J) standard errors, the wild cluster bootstrap, or both.
For a detailed discussion of how to make reliable inferences in linear regression models with clustering , see "Cluster-Robust Inference: A Guide to Empirical Practice" https://arxiv.org/abs/2205.03285 which was recently accepted
for publication at Journal of Econometrics.
summclust is available from SSC and from https://github.com/mattdwebb/summclust.
We hope you give it a try and welcome any feedback that you may have.
Comment