Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New paper explaining wild (cluster) bootstrapping with boottest

    I'm happy to announce the release of the working paper, "Fast and Wild: Bootstrap Inference in Stata Using boottest," by myself, James MacKinnon, Morten Nielsen, and Matthew Webb. It is meant as a pedagogic introduction to inference using the wild bootstrap, with an emphasis on the wild cluster bootstrap. It explains how boottest works--in particular, how it is able to execute so fast. Comments welcome.

    The paper is at: http://qed.econ.queensu.ca/working_p...ed_wp_1406.pdf

    The latest boottest is available via "ssc install boottest, replace". Features include support for multi-way clustering, fixed effects, and linear IV estimation.

  • #2
    Thanks for this great software.

    Boottest generates r(chi2). How can I get the p-value (other than by manually copying from the output)?

    Comment


    • #3
      Sorry. I just updated the boottest, and see that the latest version generates the p-value.

      Comment


      • #4
        Thanks David Roodman for the package, it's much more straightfoward than -cgm- and faster than -clustse-.

        We are testing out implementation of boottest in a difference-in-difference design and am running into a few conceptual stumbling blocks. I thought you might have guidance if I am using the package as intended.

        The study intervention was assigned at the district level, with inferences made at the institution level and lower, with a before and after time-point. There are fixed effects at the institution level, but inferences at the household data do not use panel data, so we use regress with dummies and not xtreg. The most simple example is the following, where boottest results in a larger p-value than clustering at the district for inferences at the institution.

        I cannot determine if this is normal behavior or if there is an error in my specification.


        Code:
        . regress                 depvar ppbf i.pbf i.post, cluster(district)
        
        Linear regression                                      Number of obs =     420
        F(  3,    15) =   10.56
        Prob > F      =  0.0005
        R-squared     =  0.1155
        Root MSE      =  .39536
        
        (Std. Err. adjusted for 16 clusters in district)
        
        Robust
        depvar       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]
        
        ppbf    .1759797   .1168922     1.51   0.153    -.0731701    .4251294
                     
        pbf 
        YES    -.0208636   .0417618    -0.50   0.625    -.1098767    .0681496
                     
        post 
        YES     .1730769   .0526923     3.28   0.005     .0607659    .2853879
        _cons    .1057692   .0254612     4.15   0.001     .0514999    .1600386
        
        
        . boottest                ppbf, reps (9999) bootcluster(facility_code post)
        ..........................
        
        Wild bootstrap, null imposed, 9999 replications, Wald test, bootstrap clustering    by    facility_code    post,    Rademacher    weights:
        ppbf
        
        t(15) =     1.5055
        Prob>t =     0.1765
        
        95% confidence set for null hypothesis expression: [-.0915, .4408]
        
        . 
        end of do-file



        Comment


        • #5
          I guess I don't understand the question. The two p values are very similar.

          Comment


          • #6
            Thanks David, I went back and re-examined what I was doing. I did not realize that

            Code:
            regress depvar ivar, cluster(district)
            boottest ivar, cluster(district post) bootcluster(district post)
            was different than

            Code:
            cgmreg depvar ivar, cluster(district post)
            boottest ivar, bootcluster(district post)
            Seems like it works as expected now.

            Comment


            • #7
              OK, well, I'm not certain which way you are aiming to do it. Make sure you understand the semantic difference between cluster(A B) and bootcluster(A B). See discussion at http://qed.econ.queensu.ca/working_p...06.pdf#page=29 (page 29 of the pdf).

              Comment


              • #8
                Thanks again. The section was useful and sent me down a rabbit hole of clustering which leaves me knowing just enough to be dangerous. Thanks for taking a moment to provide guidance.

                If I understand correctly, the bootcluster option will default to assigning wild weights to the interaction of the clusters, while the cluster option allows for specification of two-way clustering. I'm still not sure I understand the difference between cluster() and bootcluster() options more generally. As an illustrative example, I get different results when I try:

                Code:
                cgmreg        depvar ivar, cluster(district)
                boottest         ivar, reps (9999) bootcluster(district)
                vs.

                Code:
                cgmreg        depvar ivar, cluster(district)
                boottest         ivar, reps (9999) cluster(district) bootcluster(district)
                On a similar note, there is also a difference in magnitude when I specify

                Code:
                 regress        depvar ivar, cluster(district)
                boottest         ivar, reps (9999) bootcluster(district)
                vs

                Code:
                 cgmreg        depvar ivar, cluster(district)
                boottest         ivar, reps (9999) bootcluster(district)

                What is CGM doing differently with a single cluster?

                Comment


                • #9
                  Oh, I think you are creating trouble for yourself by running it after cgmreg. It never occurred to me that someone would do that. cgmreg works but is a bit rickety. E.g., it performs the finite-sample correction on the standard errors, but doesn't return the e(df_r) macro to indicate that it has done so, which confuses boottest and also confuses the Stata code that displays regressions results, causing it to label the standard error column with a z instead of a t.

                  If you are comparing to cgmreg, it should suffice to do "regress..., cluster(...)" and "boottest ...." without any cluster() or bootcluster() option in the latter.

                  See section 8.2 in the working paper for an example of cgmreg replication.

                  Comment


                  • #10
                    Thank you for this really useful package. It is indeed so fast, and so helpful. I have a question regarding standard errors - I used boottest to calculate the p-values after regress.
                    A journal has asked me to report standard errors rather than p-values.

                    I was thinking to back them out from the t-stat/conf-interval. But, in the paper (above), you caution against this. Do you have any advice on whether I could back them out or should I rather try seeing if they will accept reporting e.g. the confidence interval if they do not like p-values? Many thanks, Cath Porter

                    Comment


                    • #11
                      Yeah, I would definitely argue back and you can cite our paper. The notion of standard error is founded on an assumption that the actual distribution is close to some asymptotic ideal like the normal or t distribution, which has a variance parameter. The premise of using the bootstrap is that it is better not to work from that assumption. So I think a CI makes more sense.

                      Comment


                      • #12
                        Hi everyone,

                        I'm doing wild cluster bootstrap because I have only 16 clusters in my sample.

                        First I was incorporating it using the -clustse- command until I read the paper of David Roodman concerning -boottest-
                        However, as both of these commands incorporate wild cluster bootstraps, I expected to receive similar results which is not the case. Can anyone tell me why??

                        1) -clustse-

                        Code:
                        clustse reg incentives pdi_100 ownership_dummy1 ownership_dummy2 ownership_dummy3 ownership_dummy4 ownership_dummy6 ownership_dummy7 ownership_dummy8 ownership_dummy9 ownership_dummy10 firm_size_1000 firm_size_sq_1000000, cluster(country) method(wild) reps(999)
                        
                        Regress with clustered SEs/Wild bootstrap (999 successful resamples)
                        Number of clustvars=    1                        Number of obs =     3146
                        Num combinations   =    1                        R-squared     =   0.0799
                                                                         Adj R-squared =   0.0764
                                                                         G(country)    =       16
                                                                         (Bootstrapped)
                        -------------------------------------------------------------------------
                          incentives|       Coef.        Null     p-value    [95% Conf. Interval]
                        ------------+------------------------------------------------------------
                             pdi_100|  -.72198172           .   .14414414  -1.3820094  -.11686992
                        ownership_~1|   .48165751           .           0   .28949764   .67133272
                        ownership_~2|   .46217799           .   .00600601   .26297158   .67638034
                        ownership_~3|   .26538352           .   .02802803     .105272   .41936398
                        ownership_~4|   .32332918           .   .08608609   .15496914   .49061534
                        ownership_~6|   .44905933           .     .004004   .24385157   .66920632
                        ownership_~7|   .39314619           .   .07607608   .12218909   .66703445
                        ownership_~8|   .52910069           .     .004004   .18086281   .85385829
                        ownership_~9|   .43624599           .           0   .28595716   .58891535
                        ownership~10|   .40868285           .           0   .27491471   .53825468
                        firm_si~1000|   .35362714           .           0   .24710698   .46986157
                        firm~1000000|  -.06626529           .     .002002  -.08886144  -.04535305
                                cons|   2.7564857           .           0   2.2772956   3.2472365
                        -------------------------------------------------------------------------

                        2) -boottest-

                        Code:
                        regress incentives pdi_100 ownership_dummy1 ownership_dummy2 ownership_dummy3 ownership_dummy4 ownership_dummy6 ownership_dummy7 ownership_dummy8 ownership_dummy9 ownership_dummy10 firm_size_1000 firm_size_sq_1000000, cluster(country)
                        
                        Linear regression                               Number of obs     =      3,146
                                                                        F(12, 15)         =      82.78
                                                                        Prob > F          =     0.0000
                                                                        R-squared         =     0.0799
                                                                        Root MSE          =     .68227
                        
                                                               (Std. Err. adjusted for 16 clusters in country)
                        --------------------------------------------------------------------------------------
                                             |               Robust
                                  incentives |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        ---------------------+----------------------------------------------------------------
                                     pdi_100 |  -.7219817   .3637497    -1.98   0.066    -1.497296    .0533324
                            ownership_dummy1 |   .4816575   .1030858     4.67   0.000     .2619353    .7013797
                            ownership_dummy2 |    .462178   .1152298     4.01   0.001     .2165715    .7077845
                            ownership_dummy3 |   .2653835   .0871509     3.05   0.008     .0796259    .4511412
                            ownership_dummy4 |   .3233292   .0976251     3.31   0.005     .1152461    .5314122
                            ownership_dummy6 |   .4490593   .1104859     4.06   0.001     .2135642    .6845545
                            ownership_dummy7 |   .3931462   .1509348     2.60   0.020     .0714363    .7148561
                            ownership_dummy8 |   .5291007   .1744574     3.03   0.008     .1572536    .9009478
                            ownership_dummy9 |    .436246   .0904752     4.82   0.000     .2434026    .6290894
                           ownership_dummy10 |   .4086829   .0727767     5.62   0.000     .2535631    .5638026
                              firm_size_1000 |   .3536271   .0589945     5.99   0.000     .2278834    .4793708
                        firm_size_sq_1000000 |  -.0662653   .0120845    -5.48   0.000    -.0920227   -.0405079
                                       _cons |   2.756486   .2753207    10.01   0.000     2.169653    3.343318
                        --------------------------------------------------------------------------------------
                        
                        
                        
                        boottest pdi_100
                        
                        
                        Wild bootstrap, null imposed, 999 replications, Wald test, bootstrap clustering by country, Rademacher
                        >  weights:
                          pdi_100
                        
                                                               t(15) =    -1.9848
                                                            Prob>|t| =     0.0571
                        
                        95% confidence set for null hypothesis expression: [-1.831, .01927]

                        Thanks for any help in advance!!

                        Best,
                        Hanna

                        Comment


                        • #13
                          Well, the boottest command is imposing the null on the bootstrap data generating process, but it looks like the clustse command is not.
                          --David

                          Comment


                          • #14
                            Hi David,
                            I was wondering if boottest can do multiple hypothesis adjustment for the same independent variables from several different regressions? (different dependent, same set of independent) It seems that I can only regress once and boottest right after (so that I can only do multiple hypothesis adjustment to different independent variables from a single regression). is there any way to achieve that? Thanks a lot and looking forward to your kind reply!

                            Comment


                            • #15
                              Not directly no. However the formulas for the corrections offered by boottest are straightforward, so you could just apply then yourself.

                              Comment

                              Working...
                              X