Ho, ho, ho! Wild bootstrap tests for everyone!

David Roodman

Join Date: Jul 2014

Posts: 465
#1

Ho, ho, ho! Wild bootstrap tests for everyone!

27 Dec 2021, 09:53

I've translated the guts of my boottest program from Mata to Julia, producing the Julia package WildBootTests.jl. And at the page just linked, I've posted examples of calling the package from Julia, R, Python, and Stata. The Stata example is convoluted: using Stata 16 or 17, you go into Python. From there you use the Python package PyJulia to link to Julia. For it to work, you need to have installed both Python and Julia, along with PyJulia in Python and WildBootTests in Julia.

While I doubt this will be of much use to Stata users (boottest is damn fast and easier to use), I think the project is interesting in a number of ways:
It offers a model of cross-platform development of statistical software. One need invest in building a feature set and optimizing code just once. Tailored front-ends can be written for each platform. In fact Alexander Fischer is writing one for R.

Julia promises the plasticity of Python and the speed of C, roughly speaking, by way of just-in-time compilation. In my experience, fully realizing this potential takes a lot of work, at least when climbing learning curves. You have to dig into the innards of type inference and compilation and stare at pages of arcane output (from @code_warntype or SnoopCompile). Partly that is a comment on the immaturity of Julia. It has the well-recognized problem of long "time to first plot," meaning that there's a long lag the first time a package is used in a session. On my machine, the wildboottest() function often takes 12 seconds to run the first time, and it was a struggle to get it that low.

Nevertheless, the promise is real. A programmer can achieve much higher performance than with Mata, yet without having to bother with manually compiling code for multiple operating systems and CPUs, the way you do with C plug-ins for Stata. An example below shows WildBootTests 10x faster than boottest even when calling from Stata.

Julia could be more directly integrated into Stata, making the link easier and more reliable. I've already suggested that Stata corp do this, the way they have for Python. Or maybe a user could lead the way, as James Fiedler did for Python.

Here is the example. This does not demonstrate how to install Python, Julia, PyJulia, and WildBootTests, just how to run them once installed. The data set is used in Fast and Wild.

Code:

infile coll merit male black asian year state chst using regm.raw, clear qui xi: regress coll merit male black asian i.year i.state, cluster(state) generate individual = _n // unique ID for each observation timer clear timer on 1 boottest merit, nogr reps(9999) bootcluster(individual) // subcluster bootstrap timer off 1 timer on 2 mat b = e(b)[1,1..colsof(e(b))-1] // drop constant term global vars: colnames b // get right-side variable names python from julia import WildBootTests as wbt import numpy as np from sfi import Data R = np.concatenate(([1], np.zeros(`=colsof(b)'))).reshape(1,-1) # put null in Rβ = r form r = np.array([0]) resp = np.asarray(Data.get('coll')) # get response variable predexog = np.c_[np.asarray(Data.get('$vars')), np.ones(resp.size)] # get exogenous predictor variables + constant clustid = np.asarray(Data.get('individual state')).astype(int) # get clustering variables test = wbt.wildboottest(R, r, resp=resp, predexog=predexog, clustid=clustid, nbootclustvar=1, nerrclustvar=1, reps=9999) # do test wbt.teststat(test) # show results wbt.p(test) wbt.CI(test) end timer off 2 timer list

On my machine, I get

Code:

. timer list 1: 22.64 / 1 = 22.6360 2: 2.10 / 1 = 2.1040

...meaning the new version is 10x faster.

One source of speed-up is that by default wildboottest() does all computations in single-precision (32-bit floats) rather than double, something that is not possible in Mata, but I think is typically fine for a bootstrap-based test.

Last edited by David Roodman; 27 Dec 2021, 10:18.
Tags: None

6 likes
JanDitzen

Join Date: Jan 2015

Posts: 348
#2

29 Dec 2021, 03:17

Thanks David for the post - this is amazing and a great idea respectively suggestion. It keeps bugging me that once you have a program in one language, converting it into another one takes time and is often not trivial. Using one language as a base is a great starting point, but the interaction between the programs is key as well. I really hope that the next Stata version comes with improved R and Julia interaction. This would be a game changer for users and programmers.

One question/suggestion: I believe you could wrap the python code into a Stata program. For example you would code:

Code:

boottest merit, nogr reps(9999) bootcluster(individual) julia

where the option julia then computes the python code from your example. This would make the integration into Stata even easier.
Comment
David Roodman

Join Date: Jul 2014

Posts: 465
#3

29 Dec 2021, 07:11

Hi Jan. I quite agree. I had thought of making a separate "boottestjl" commad for Stata, but I like your idea of just adding an option to boottest. I'll put it on my list...
1 like
Comment
David Roodman

Join Date: Jul 2014

Posts: 465
#4

22 Mar 2022, 06:46

JanDitzen, done!
Comment

Announcement

Ho, ho, ho! Wild bootstrap tests for everyone!

Comment

Comment

Comment