Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering standard error on two levels

    Hi,

    I am working with quarterly panel data on firm-specific variables and running a regression with firm and quarter fixed effects. Based on recommendations in the existing literature, I have clustered standard errors at both the firm and quarter level.

    The general syntax is:

    Code:
    reghdfe DV IV Controls, absorb(Firm_ID Quarter) vce(cluster Firm_ID Quarter)
    The models include quarterly firm-level variables, such as sales, assets, and firm size, as controls. However, I’m noticing that many of my control variables lose significance when clustering at both the firm and quarter levels compared to using only -vce(robust).

    My data consist of 203 firm clusters and 771 quarter clusters. I've based my model specifications on existing literature but couldn't observes similiar results with the control variable significance when clustering at both levels.

    Do you maybe have potential reasons for this loss of significance in mind or recommend additional steps to verify that the clustering is correctly specified?

  • #2
    I don't see anything wrong with what you've done here.

    However, I’m noticing that many of my control variables lose significance when clustering at both the firm and quarter levels compared to using only -vce(robust).
    Why do you care about this? There is nothing less important in statistics than the significance of a "control" variable. If you really care about the significance of a variable, it is, by definition, not a "control" variable. "Control" variables are introduced into a model when we do not care directly about their effects on the outcome, but we need to adjust the analysis for their confounding (aka omitted variable bias) effects or to reduce the extraneous variance they add to the outcome.

    Comment


    • #3
      Thanks for your insights—I completely agree with your assessment on the relevance of controls in general. It just seems odd to me that the control variables, which seem considered relevant in existing studies, don’t add any explanatory power to my model. I mainly wanted to ensure that I’m ruling out any potential model misspecifications.

      Comment


      • #4
        ...don’t add any explanatory power to my model.
        So you have fallen hook, line, and sinker for the bad teaching you undoubtedly got when you were first learning statistics. This wrong in two important ways.
        1. It is fallacious to say that something that is not statistically significant doesn't add any explanatory power. Statistical significance is defined by imposing an arbitrary threshold (usually .05) on an inherently continuous statistic and then pretending that imposition of an arbitrary threshold corresponds to a qualitative difference. It doesn't. The p-value itself is something of a trashy statistic because it smashes the effect size and the (effective) sample size into a single number whose meaning is then unclear. But even if you want to believe that the p-values are worthwhile, there is no threshold significance level that can actually reliably distinguish "no effect" from "an effect." The best you can say about a non-significant p-value result is that the analysis is inconclusive about the effect: it might be zero, but, it might be (and in fact more likely, is) just small enough that you can't distinguish it from the noise in the data.*
        2. Even disregarding everything I just said in 1 and pretending that statistical significance really does mean "effect" vs "no effect," you can't draw any conclusion about the set of control variables' explanatory power from the individual significance tests of each one: you have to do a joint significance test for that. It is entirely possible that the joint test is highly "significant" even though none of the individual variable tests is. (Of course, it can also go the other way: you can have significant individual variable effects yet the joint test proves not significant. Or the significance of the joint and single variable tests can be concordant.)
        You should not blame yourself for getting these things wrong. These incorrect understandings of statistical significance are widespread, and they will remain so because they are widely taught to students learning statistics. These myths will continue to support zombie statistical practice until, hopefully someday, introductory statistics becomes competently taught.

        Added: *Not to mention that when you are trying to explain a non-significant result, lack of explanatory effect should be the very last thing you think of, and you should only resort to it if you have ruled out: lack of statistical power, a biased, unrepresentative sample, an improperly specified model, errors in the data, problems associated with missing values, etc.
        Last edited by Clyde Schechter; 27 Oct 2024, 16:08.

        Comment

        Working...
        X