New paper explaining wild (cluster) bootstrapping with boottest

David Roodman

Join Date: Jul 2014

Posts: 464
#1

New paper explaining wild (cluster) bootstrapping with boottest

20 Jun 2018, 17:41

I'm happy to announce the release of the working paper, "Fast and Wild: Bootstrap Inference in Stata Using boottest," by myself, James MacKinnon, Morten Nielsen, and Matthew Webb. It is meant as a pedagogic introduction to inference using the wild bootstrap, with an emphasis on the wild cluster bootstrap. It explains how boottest works--in particular, how it is able to execute so fast. Comments welcome.

The paper is at: http://qed.econ.queensu.ca/working_p...ed_wp_1406.pdf

The latest boottest is available via "ssc install boottest, replace". Features include support for multi-way clustering, fixed effects, and linear IV estimation.
Tags: None

3 likes
IPL Png

Join Date: Oct 2016

Posts: 9
#2

26 Oct 2018, 17:39

Thanks for this great software.

Boottest generates r(chi2). How can I get the p-value (other than by manually copying from the output)?
Comment
IPL Png

Join Date: Oct 2016

Posts: 9
#3

26 Oct 2018, 17:45

Sorry. I just updated the boottest, and see that the latest version generates the p-value.
Comment

Tashrik Ahmed

Join Date: Nov 2018
Posts: 3

27 Nov 2018, 15:04

Thanks David Roodman for the package, it's much more straightfoward than -cgm- and faster than -clustse-.

We are testing out implementation of boottest in a difference-in-difference design and am running into a few conceptual stumbling blocks. I thought you might have guidance if I am using the package as intended.

The study intervention was assigned at the district level, with inferences made at the institution level and lower, with a before and after time-point. There are fixed effects at the institution level, but inferences at the household data do not use panel data, so we use regress with dummies and not xtreg. The most simple example is the following, where boottest results in a larger p-value than clustering at the district for inferences at the institution.

I cannot determine if this is normal behavior or if there is an error in my specification.

Code:

. regress                 depvar ppbf i.pbf i.post, cluster(district)

Linear regression                                      Number of obs =     420
F(  3,    15) =   10.56
Prob > F      =  0.0005
R-squared     =  0.1155
Root MSE      =  .39536

(Std. Err. adjusted for 16 clusters in district)

Robust
depvar       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]

ppbf    .1759797   .1168922     1.51   0.153    -.0731701    .4251294
             
pbf 
YES    -.0208636   .0417618    -0.50   0.625    -.1098767    .0681496
             
post 
YES     .1730769   .0526923     3.28   0.005     .0607659    .2853879
_cons    .1057692   .0254612     4.15   0.001     .0514999    .1600386


. boottest                ppbf, reps (9999) bootcluster(facility_code post)
..........................

Wild bootstrap, null imposed, 9999 replications, Wald test, bootstrap clustering    by    facility_code    post,    Rademacher    weights:
ppbf

t(15) =     1.5055
Prob>t =     0.1765

95% confidence set for null hypothesis expression: [-.0915, .4408]

. 
end of do-file

Comment

David Roodman

Join Date: Jul 2014

Posts: 464
#5

27 Nov 2018, 15:18

I guess I don't understand the question. The two p values are very similar.
Comment
Tashrik Ahmed

Join Date: Nov 2018

Posts: 3
#6

30 Nov 2018, 11:05

Thanks David, I went back and re-examined what I was doing. I did not realize that

Code:

regress depvar ivar, cluster(district) boottest ivar, cluster(district post) bootcluster(district post)

was different than

Code:

cgmreg depvar ivar, cluster(district post) boottest ivar, bootcluster(district post)

Seems like it works as expected now.
Comment
David Roodman

Join Date: Jul 2014

Posts: 464
#7

30 Nov 2018, 11:10

OK, well, I'm not certain which way you are aiming to do it. Make sure you understand the semantic difference between cluster(A B) and bootcluster(A B). See discussion at http://qed.econ.queensu.ca/working_p...06.pdf#page=29 (page 29 of the pdf).
Comment
Tashrik Ahmed

Join Date: Nov 2018

Posts: 3
#8

30 Nov 2018, 13:18

Thanks again. The section was useful and sent me down a rabbit hole of clustering which leaves me knowing just enough to be dangerous. Thanks for taking a moment to provide guidance.

If I understand correctly, the bootcluster option will default to assigning wild weights to the interaction of the clusters, while the cluster option allows for specification of two-way clustering. I'm still not sure I understand the difference between cluster() and bootcluster() options more generally. As an illustrative example, I get different results when I try:

Code:

cgmreg depvar ivar, cluster(district) boottest ivar, reps (9999) bootcluster(district)

vs.

Code:

cgmreg depvar ivar, cluster(district) boottest ivar, reps (9999) cluster(district) bootcluster(district)

On a similar note, there is also a difference in magnitude when I specify

Code:

regress depvar ivar, cluster(district) boottest ivar, reps (9999) bootcluster(district)

vs

Code:

cgmreg depvar ivar, cluster(district) boottest ivar, reps (9999) bootcluster(district)

What is CGM doing differently with a single cluster?
Comment
David Roodman

Join Date: Jul 2014

Posts: 464
#9

30 Nov 2018, 16:13

Oh, I think you are creating trouble for yourself by running it after cgmreg. It never occurred to me that someone would do that. cgmreg works but is a bit rickety. E.g., it performs the finite-sample correction on the standard errors, but doesn't return the e(df_r) macro to indicate that it has done so, which confuses boottest and also confuses the Stata code that displays regressions results, causing it to label the standard error column with a z instead of a t.

If you are comparing to cgmreg, it should suffice to do "regress..., cluster(...)" and "boottest ...." without any cluster() or bootcluster() option in the latter.

See section 8.2 in the working paper for an example of cgmreg replication.
Comment
Cath Porter

Join Date: Jun 2019

Posts: 1
#10

25 Jun 2019, 11:28

Thank you for this really useful package. It is indeed so fast, and so helpful. I have a question regarding standard errors - I used boottest to calculate the p-values after regress.
A journal has asked me to report standard errors rather than p-values.

I was thinking to back them out from the t-stat/conf-interval. But, in the paper (above), you caution against this. Do you have any advice on whether I could back them out or should I rather try seeing if they will accept reporting e.g. the confidence interval if they do not like p-values? Many thanks, Cath Porter
Comment
David Roodman

Join Date: Jul 2014

Posts: 464
#11

25 Jun 2019, 11:47

Yeah, I would definitely argue back and you can cite our paper. The notion of standard error is founded on an assumption that the actual distribution is close to some asymptotic ideal like the normal or t distribution, which has a variance parameter. The premise of using the bootstrap is that it is better not to work from that assumption. So I think a CI makes more sense.
1 like
Comment

Hanna Lanzinger

Join Date: Jun 2019
Posts: 8

#12

26 Jul 2019, 01:47

Hi everyone,

I'm doing wild cluster bootstrap because I have only 16 clusters in my sample.

First I was incorporating it using the -clustse- command until I read the paper of David Roodman concerning -boottest-
However, as both of these commands incorporate wild cluster bootstraps, I expected to receive similar results which is not the case. Can anyone tell me why??

1) -clustse-

Code:

clustse reg incentives pdi_100 ownership_dummy1 ownership_dummy2 ownership_dummy3 ownership_dummy4 ownership_dummy6 ownership_dummy7 ownership_dummy8 ownership_dummy9 ownership_dummy10 firm_size_1000 firm_size_sq_1000000, cluster(country) method(wild) reps(999)

Regress with clustered SEs/Wild bootstrap (999 successful resamples)
Number of clustvars=    1                        Number of obs =     3146
Num combinations   =    1                        R-squared     =   0.0799
                                                 Adj R-squared =   0.0764
                                                 G(country)    =       16
                                                 (Bootstrapped)
-------------------------------------------------------------------------
  incentives|       Coef.        Null     p-value    [95% Conf. Interval]
------------+------------------------------------------------------------
     pdi_100|  -.72198172           .   .14414414  -1.3820094  -.11686992
ownership_~1|   .48165751           .           0   .28949764   .67133272
ownership_~2|   .46217799           .   .00600601   .26297158   .67638034
ownership_~3|   .26538352           .   .02802803     .105272   .41936398
ownership_~4|   .32332918           .   .08608609   .15496914   .49061534
ownership_~6|   .44905933           .     .004004   .24385157   .66920632
ownership_~7|   .39314619           .   .07607608   .12218909   .66703445
ownership_~8|   .52910069           .     .004004   .18086281   .85385829
ownership_~9|   .43624599           .           0   .28595716   .58891535
ownership~10|   .40868285           .           0   .27491471   .53825468
firm_si~1000|   .35362714           .           0   .24710698   .46986157
firm~1000000|  -.06626529           .     .002002  -.08886144  -.04535305
        cons|   2.7564857           .           0   2.2772956   3.2472365
-------------------------------------------------------------------------

2) -boottest-

Code:

regress incentives pdi_100 ownership_dummy1 ownership_dummy2 ownership_dummy3 ownership_dummy4 ownership_dummy6 ownership_dummy7 ownership_dummy8 ownership_dummy9 ownership_dummy10 firm_size_1000 firm_size_sq_1000000, cluster(country)

Linear regression                               Number of obs     =      3,146
                                                F(12, 15)         =      82.78
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0799
                                                Root MSE          =     .68227

                                       (Std. Err. adjusted for 16 clusters in country)
--------------------------------------------------------------------------------------
                     |               Robust
          incentives |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
             pdi_100 |  -.7219817   .3637497    -1.98   0.066    -1.497296    .0533324
    ownership_dummy1 |   .4816575   .1030858     4.67   0.000     .2619353    .7013797
    ownership_dummy2 |    .462178   .1152298     4.01   0.001     .2165715    .7077845
    ownership_dummy3 |   .2653835   .0871509     3.05   0.008     .0796259    .4511412
    ownership_dummy4 |   .3233292   .0976251     3.31   0.005     .1152461    .5314122
    ownership_dummy6 |   .4490593   .1104859     4.06   0.001     .2135642    .6845545
    ownership_dummy7 |   .3931462   .1509348     2.60   0.020     .0714363    .7148561
    ownership_dummy8 |   .5291007   .1744574     3.03   0.008     .1572536    .9009478
    ownership_dummy9 |    .436246   .0904752     4.82   0.000     .2434026    .6290894
   ownership_dummy10 |   .4086829   .0727767     5.62   0.000     .2535631    .5638026
      firm_size_1000 |   .3536271   .0589945     5.99   0.000     .2278834    .4793708
firm_size_sq_1000000 |  -.0662653   .0120845    -5.48   0.000    -.0920227   -.0405079
               _cons |   2.756486   .2753207    10.01   0.000     2.169653    3.343318
--------------------------------------------------------------------------------------



boottest pdi_100


Wild bootstrap, null imposed, 999 replications, Wald test, bootstrap clustering by country, Rademacher
>  weights:
  pdi_100

                                       t(15) =    -1.9848
                                    Prob>|t| =     0.0571

95% confidence set for null hypothesis expression: [-1.831, .01927]

Thanks for any help in advance!!

Best,
Hanna

Comment

David Roodman

Join Date: Jul 2014

Posts: 464
#13

26 Jul 2019, 06:22

Well, the boottest command is imposing the null on the bootstrap data generating process, but it looks like the clustse command is not.
--David
1 like
Comment
Hanyi Wang

Join Date: Nov 2019

Posts: 2
#14

08 Nov 2019, 02:08

Hi David,
I was wondering if boottest can do multiple hypothesis adjustment for the same independent variables from several different regressions? (different dependent, same set of independent) It seems that I can only regress once and boottest right after (so that I can only do multiple hypothesis adjustment to different independent variables from a single regression). is there any way to achieve that? Thanks a lot and looking forward to your kind reply!
Comment
David Roodman

Join Date: Jul 2014

Posts: 464
#15

08 Nov 2019, 06:19

Not directly no. However the formulas for the corrections offered by boottest are straightforward, so you could just apply then yourself.
Comment

Announcement