Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapped SEs

    Dear all,

    I am interested in a model of the following type:

    reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)

    and

    reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)


    The number of clusters for country (=34) and time (=24) is very small. Looking at the levels plot, I feel like clustering is over-penalizing the SEs due to which the coefficient lacks statistical significance despite the sizable economic effect implied by the coefficient.

    I wanted to bootstrap the SEs and wanted to know what the differences are among the following specifications:


    (1) bootstrap _b, reps(1000): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)

    (2) bootstrap _b, reps(1000) cl(countryid): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)

    (3) bootstrap _b, reps(1000) cl(countryid): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)

    (4) bootstrap _b, reps(1000) cl(countryid time): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)

    Am I correct in assuming that in (1), it takes random draws from the whole sample, whereas in (2)-(4) these draws are from cluster definition in cl(.) part of the command, and lastly, there is no difference in (1) and (4) as the data are a panel at the country-time level?

    Another question I have is how is (2)-(3) helping to overcome the issue of a small number of clusters?

  • #2
    Not being familiar with -reghdfe- I will offer a few general comments. If you believe that your data are clustered in in a hierarchical sense (e.g., place and time), then the cluster-adjusted SEs will properly account for that. Iff the ICC is zero (or very close), the results will converge to the non-clustered, independent situation. I think bootstrapping can only show you results similar to what (I assume) are cluster-robust SE estimators. In other words, the bootstrapped SEs would just be an approximation to the analytic result.

    General advice from the manual:

    Many estimation commands allow the vce(bootstrap) option. For those commands, we recommend using vce(bootstrap) over bootstrap because the estimation command already handles clustering and other model-specific details for you.
    The bootstrap prefix command is intended for use with nonestimation commands, such as summarize, user-written programs, or functions of coefficients.
    If you must use the prefix, and want the clustered bootstrap (i.e., simple random sampling of whole clusters with replacement), then you need to use the -cluster()- and -idcluster()- options of bootstrap, and the variable specified in -idcluster()- in the subsequent command. The -idcluster()- creates a new variable to index resampled clusters to maintain the same number of clusters. Without it, you may sample the same cluster mulitple times with the same id, and then it will look like a "supersized" cluster to the subsequent command. Clustering at multiple levels requires careful thought for how to bootstrap.

    Comment

    Working...
    X