Dear all,
I am interested in a model of the following type:
reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)
and
reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)
The number of clusters for country (=34) and time (=24) is very small. Looking at the levels plot, I feel like clustering is over-penalizing the SEs due to which the coefficient lacks statistical significance despite the sizable economic effect implied by the coefficient.
I wanted to bootstrap the SEs and wanted to know what the differences are among the following specifications:
(1) bootstrap _b, reps(1000): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)
(2) bootstrap _b, reps(1000) cl(countryid): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)
(3) bootstrap _b, reps(1000) cl(countryid): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)
(4) bootstrap _b, reps(1000) cl(countryid time): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)
Am I correct in assuming that in (1), it takes random draws from the whole sample, whereas in (2)-(4) these draws are from cluster definition in cl(.) part of the command, and lastly, there is no difference in (1) and (4) as the data are a panel at the country-time level?
Another question I have is how is (2)-(3) helping to overcome the issue of a small number of clusters?
I am interested in a model of the following type:
reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)
and
reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)
The number of clusters for country (=34) and time (=24) is very small. Looking at the levels plot, I feel like clustering is over-penalizing the SEs due to which the coefficient lacks statistical significance despite the sizable economic effect implied by the coefficient.
I wanted to bootstrap the SEs and wanted to know what the differences are among the following specifications:
(1) bootstrap _b, reps(1000): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)
(2) bootstrap _b, reps(1000) cl(countryid): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid)
(3) bootstrap _b, reps(1000) cl(countryid): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)
(4) bootstrap _b, reps(1000) cl(countryid time): reghdfe y i.treated##i.post, absorb(i.countryid i.time i.dim3) cluster(countryid time)
Am I correct in assuming that in (1), it takes random draws from the whole sample, whereas in (2)-(4) these draws are from cluster definition in cl(.) part of the command, and lastly, there is no difference in (1) and (4) as the data are a panel at the country-time level?
Another question I have is how is (2)-(3) helping to overcome the issue of a small number of clusters?
Comment