Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster-Robust Standard Errors

    Hi all,

    I stumbled upon this webpage on Stata's website: https://www.stata.com/support/faqs/s...luster-option/.

    Please correct me if I'm wrong, but I am under the impression that when Stata users type
    Code:
    reg y x, r
    They get HC1 standard errors as in Hinkley, 1977, correct?

    However, if they type vce(hc3) they will obtain HC3 standard errors as in MacKinnon and White (1985), which have been shown by Long and Ervin (2000) to outperform HC0, HC1, HC2 and HC4 in terms of size properties.

    My question is the following: in panel data, researchers often invoke (according to guidelines set out by Abadie, Athey, Imbens and Wooldridge (2022)) cluster-robust standard errors, which according to Stata's website are "simply that of the robust (unclustered) estimator [HC1] with the individual ei*xi’s replaced by their sums over each cluster."

    Is it possible to invoke the cluster option and combine it with the HC3 standard error formula? Would it make sense econometrically?

    Apologies if the question is non-sensical from a statistical point of view.

  • #2
    Presumably, Jeff Wooldridge would give the best comments on this

    Comment


    • #3
      It would be amazing to get Prof. Wooldridge's response it is also quite an important question for quite a few applied researchers who use Stata I reckon, as panel data is becoming more and more available and valid inference is quite crucial in most fields of study.

      Comment


      • #4
        This is a very good question, and not a very good explanation on the Stata website.

        The HC1, HC2, HC3 make various adjustments to the residual which are supposedly resulting in better small sample properties. One of those uses the leverage (the hat matrix elements) another one something else, etc.

        The cluster robust standard errors do not make any adjustment to the residual, they just use the residual as it is.

        Using the leverage adjusted and all other adjusted standard errors/variances is very easy in Stata, with the programmers command - _robust -.

        Whether using adjustments such as in HC2 and HC3 in the cluster robust variance would lead to substantial improvements, is an open research question. To my knowledge, there is no paper on the topic.

        Comment


        • #5
          Hi Maxence,

          The short answer to your question is YES. It makes a lot of sense. Matt Webb, James MacKinnon, and their coauthors have been working on these topics and have Stata code for it. There is also some interesting results from Bruce Hansen.

          For Matt's results, please see:

          https://www.statalist.org/forums/for...bust-inference

          For what Bruce advocates for:

          https://www.ssc.wisc.edu/~bhansen/papers/tcauchy.html

          Also, note that to get what you want you can use

          Code:
          vce(jackknife,mse)

          Comment


          • #6
            Thank you very much for your responses. I'll read up on the suggested papers!

            Comment


            • #7
              Enrique Pinzon (StataCorp) should also have cited his recent blog post: https://blog.stata.com/2022/10/06/he...considerations (although he does not mention panels in the blog, there is still a lot to think about)

              Comment


              • #8
                Originally posted by Enrique Pinzon (StataCorp) View Post
                Hi Maxence,

                The short answer to your question is YES. It makes a lot of sense. Matt Webb, James MacKinnon, and their coauthors have been working on these topics and have Stata code for it. There is also some interesting results from Bruce Hansen.

                For Matt's results, please see:

                https://www.statalist.org/forums/for...bust-inference

                For what Bruce advocates for:

                https://www.ssc.wisc.edu/~bhansen/papers/tcauchy.html

                Also, note that to get what you want you can use

                Code:
                vce(jackknife,mse)
                Hi Enrique, you have done an awesome blogpost in what Rich referred to in #7 ! Also thank you for the awesome summary of the recent literature.

                Can you please elaborate how we can do the cluster jackknife recommended by Bruce Hansen in Stata?

                It seems to me that Stata allows either cluster, or jackknife at the individual observation level.

                E.g.,

                Code:
                . sysuse auto
                (1978 automobile data)
                
                . reg price mpg, vce(cluster rep)
                
                Linear regression                               Number of obs     =         69
                                                                F(1, 4)           =       7.50
                                                                Prob > F          =     0.0519
                                                                R-squared         =     0.2079
                                                                Root MSE          =     2611.4
                
                                                  (Std. err. adjusted for 5 clusters in rep78)
                ------------------------------------------------------------------------------
                             |               Robust
                       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         mpg |  -226.3607   82.63217    -2.74   0.052    -455.7843    3.063024
                       _cons |   10965.23   1591.972     6.89   0.002     6545.205    15385.25
                ------------------------------------------------------------------------------
                
                . reg price mpg, vce(jackknife, mse)
                (running regress on estimation sample)
                
                Jackknife replications (74)
                ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                ..................................................    50
                ........................
                
                Linear regression                                    Number of obs =        74
                                                                     Replications  =        74
                                                                     F(1, 73)      =     14.96
                                                                     Prob > F      =    0.0002
                                                                     R-squared     =    0.2196
                                                                     Adj R-squared =    0.2087
                                                                     Root MSE      = 2623.6529
                
                ------------------------------------------------------------------------------
                             |              Jknife *
                       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         mpg |  -238.8943   61.75999    -3.87   0.000    -361.9818   -115.8069
                       _cons |   11253.06   1451.724     7.75   0.000      8359.78    14146.34
                ------------------------------------------------------------------------------
                But how can we do the jackknifing at the cluster level? It does not seem like Stata would allow to use both, that is jackknifing at the cluster level?
                Last edited by Joro Kolev; 23 Dec 2022, 02:57.

                Comment


                • #9
                  Originally posted by Joro Kolev View Post
                  how can we do the jackknifing at the cluster level?
                  .ÿ
                  .ÿversionÿ17.0

                  .ÿ
                  .ÿclearÿ*

                  .ÿ
                  .ÿquietlyÿsysuseÿauto

                  .ÿ
                  .ÿ//ÿseedem
                  .ÿsetÿseedÿ1814593309

                  .ÿsummarizeÿrep78,ÿmeanonly

                  .ÿquietlyÿreplaceÿrep78ÿ=ÿruniformint(r(min),ÿr(max))ÿifÿmissing(rep78)

                  .ÿ
                  .ÿ*
                  .ÿ*ÿBeginÿhere
                  .ÿ*
                  .ÿjacknifeÿ_b[mpg]ÿ_b[_cons],ÿeclassÿcluster(rep78)ÿmse:ÿregressÿpriceÿc.mpg
                  (runningÿregressÿonÿestimationÿsample)

                  Jackknifeÿreplicationsÿ(5)
                  ----+---ÿ1ÿ---+---ÿ2ÿ---+---ÿ3ÿ---+---ÿ4ÿ---+---ÿ5ÿ
                  .....

                  LinearÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ74
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿReplicationsÿÿ=ÿÿ5

                  ÿÿÿÿÿÿCommand:ÿregressÿpriceÿc.mpg
                  ÿÿÿÿÿÿÿÿ_jk_1:ÿ_b[mpg]
                  ÿÿÿÿÿÿÿÿ_jk_2:ÿ_b[_cons]
                  ÿÿÿÿÿÿÿÿÿÿn():ÿe(N)

                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Replicationsÿbasedÿonÿ5ÿclustersÿinÿrep78)
                  ------------------------------------------------------------------------------
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿJknifeÿ*
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿCoefficientÿÿstd.ÿerr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
                  -------------+----------------------------------------------------------------
                  ÿÿÿÿÿÿÿ_jk_1ÿ|ÿÿ-238.8943ÿÿÿ92.69114ÿÿÿÿ-2.58ÿÿÿ0.062ÿÿÿÿ-496.2462ÿÿÿÿ18.45752
                  ÿÿÿÿÿÿÿ_jk_2ÿ|ÿÿÿ11253.06ÿÿÿ1726.235ÿÿÿÿÿ6.52ÿÿÿ0.003ÿÿÿÿÿ6460.265ÿÿÿÿ16045.86
                  ------------------------------------------------------------------------------

                  .ÿ
                  .ÿexit

                  endÿofÿdo-file


                  .

                  Comment


                  • #10
                    You can also look into the following alternatives.
                    Code:
                    xtreg price c.mpg, i(rep78) vce(jackknife, mse) fe
                    xtgee price c.mpg, i(rep78) family(gaussian) link(identity) corr(independent) vce(jackknife, mse)
                    Here, the syntax that Enrique showed works directly, as-is.

                    Comment


                    • #11
                      Originally posted by Enrique Pinzon (StataCorp) View Post

                      Also, note that to get what you want you can use

                      Code:
                      vce(jackknife,mse)
                      Very useful, thank you.
                      I have however one doubt. Why should one use the "mse" option? If I well understand, this implies centering at the full sample estimator, instead of at the sample estimator excluding the specific cluster each time. Isn't the HC3 standard error in MacKinnon and White (1985) instead calculated by centering at the delete-one-cluster estimator?

                      Comment

                      Working...
                      X