Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with results becoming insignificant when clustering

    Hi,
    I am using reghdfe to run triple differences and was originally clustering at the indiviudal level but I have just been told I actually need to double cluster at the country and ethnicity level. When I run this new regression, I get the same coefficients but my p value for my estimator has become insignificant and I am unsure why. Here is my code and the results from this new code.

    Code:
    . reghdfe underagemar dchildmar##postreform2 , vce (cluster ethnicityall country) absorb(country ethnicit
    > yall currentyear) 
    note: 1bn.dchildmar is probably collinear with the fixed effects (all partialled-out values are close to 
    > zero; tol = 1.0e-09)
    (MWFE estimator converged in 13 iterations)
    Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
    note: 1.dchildmar omitted because of collinearity
    
    HDFE Linear regression                            Number of obs   =  7,936,575
    Absorbing 3 HDFE groups                           F(   2,     10) =     189.92
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.1692
                                                      Adj R-squared   =     0.1692
    Number of clusters (ethnicityall) =        136    Within R-sq.    =     0.0430
    Number of clusters (country) =         11         Root MSE        =     0.4557
    
                               (Std. err. adjusted for 11 clusters in ethnicityall country)
    ---------------------------------------------------------------------------------------
                          |               Robust
              underagemar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ----------------------+----------------------------------------------------------------
              1.dchildmar |          0  (omitted)
            1.postreform2 |  -.2264258   .0199573   -11.35   0.000    -.2708934   -.1819582
                          |
    dchildmar#postreform2 |
                     1 1  |  -.0131314    .033657    -0.39   0.705    -.0881239    .0618611
                          |
                    _cons |   .5751284   .0044021   130.65   0.000     .5653199    .5849369
    ---------------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
    --------------+---------------------------------------|
          country |        11          11           0    *|
     ethnicityall |       136         136           0    *|
      currentyear |        72           0          72     |
    ------------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation

  • #2
    Anjaly:
    as per the outcome of the community-contributed module -reghdfe-, it would seem that your interaction does not reach statistical significance (and probably you cal leave it out).
    That said, I'm under the impression that the main issue you've to deal with is the very low within R_sq, that mirrors the omission of other relevant predictors in the right-hand side of your regression equation.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you Carlo, I am currently looking at adding controls in to see if this changes anything - I initially controlled for age and whether an individual lived in a rural area and this decreased the p value a bit (although still insignificant), however I was unsure if this was actually relevant if I am running my regressions at the country level.

      Unfortunately I cannot leave the interaction term out as it is my main variable of interest - I am running a triple difference estimation and that is what the interaction term is supposed to be showing. After making this post, I ran my regressions for 3 other equations with different dependent variables - these all still came back significant, my issue is only with the regression I have posted, do you know the reason for this?

      Comment


      • #4
        Anjali:
        1) as -within- estimator works at its best when there's enough variation in time-varying variables, it may well be that this is not the case with your regressand;
        2) controls are interesting, but I meant predictors that contribute to give a fair and true view of the data generating process that you're investigating;
        3) being (I dare to say with the due respect) obsessed with p-values is a very harmful habit, as data are what they are: significant coefficients are as informative as their non-significant counterparts. It's what you make with them in terms of discussion/dissemination that can make the difference.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Ok thank you, I was told to run reghdfe so that I can control for multiple fixed effects. This is the output I get when I include more variables in my regression - the R squared increases. I just wanted to confirm that I can still report these results I just need to mention the insignificance?
          Code:
          . reghdfe underagemar dchildmar##postreform2 rural age edyrtotal , vce(cluster ethnicityall country) abso
          > rb(country ethnicityall currentyear) 
          note: 1bn.dchildmar is probably collinear with the fixed effects (all partialled-out values are close to 
          > zero; tol = 1.0e-09)
          (MWFE estimator converged in 13 iterations)
          Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
          note: 1.dchildmar omitted because of collinearity
          
          HDFE Linear regression                            Number of obs   =  7,936,575
          Absorbing 3 HDFE groups                           F(   5,     10) =      70.67
          Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                            R-squared       =     0.2311
                                                            Adj R-squared   =     0.2311
          Number of clusters (ethnicityall) =        136    Within R-sq.    =     0.1143
          Number of clusters (country) =         11         Root MSE        =     0.4384
          
                                     (Std. err. adjusted for 11 clusters in ethnicityall country)
          ---------------------------------------------------------------------------------------
                                |               Robust
                    underagemar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          ----------------------+----------------------------------------------------------------
                    1.dchildmar |          0  (omitted)
                  1.postreform2 |  -.3463596   .0278298   -12.45   0.000    -.4083682   -.2843511
                                |
          dchildmar#postreform2 |
                           1 1  |  -.0436666   .0254795    -1.71   0.117    -.1004385    .0131053
                                |
                          rural |   .0471989    .008327     5.67   0.000     .0286452    .0657527
                            age |  -.0147942   .0013231   -11.18   0.000    -.0177422   -.0118462
                      edyrtotal |  -.0176535   .0023846    -7.40   0.000    -.0229667   -.0123403
                          _cons |   .9173998   .0295395    31.06   0.000     .8515816     .983218
          ---------------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          ------------------------------------------------------+
            Absorbed FE | Categories  - Redundant  = Num. Coefs |
          --------------+---------------------------------------|
                country |        11          11           0    *|
           ethnicityall |       136         136           0    *|
            currentyear |        72           0          72     |
          ------------------------------------------------------+
          * = FE nested within cluster; treated as redundant for DoF computation
          
          .

          Comment


          • #6
            Anjali:
            1) -1bn.dchildmar- is probably collinear with the fixed effects (probably because it is a time-invariant predictor and, as such, it id wiped-out by the -fe- machinery);
            2) you should reporte al the results of your regeression, no mater their statistical significance.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment

            Working...
            X