Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ho, ho, ho! Wild bootstrap tests for everyone!

    I've translated the guts of my boottest program from Mata to Julia, producing the Julia package WildBootTests.jl. And at the page just linked, I've posted examples of calling the package from Julia, R, Python, and Stata. The Stata example is convoluted: using Stata 16 or 17, you go into Python. From there you use the Python package PyJulia to link to Julia. For it to work, you need to have installed both Python and Julia, along with PyJulia in Python and WildBootTests in Julia.

    While I doubt this will be of much use to Stata users (boottest is damn fast and easier to use), I think the project is interesting in a number of ways:
    1. It offers a model of cross-platform development of statistical software. One need invest in building a feature set and optimizing code just once. Tailored front-ends can be written for each platform. In fact Alexander Fischer is writing one for R.
    2. Julia promises the plasticity of Python and the speed of C, roughly speaking, by way of just-in-time compilation. In my experience, fully realizing this potential takes a lot of work, at least when climbing learning curves. You have to dig into the innards of type inference and compilation and stare at pages of arcane output (from @code_warntype or SnoopCompile). Partly that is a comment on the immaturity of Julia. It has the well-recognized problem of long "time to first plot," meaning that there's a long lag the first time a package is used in a session. On my machine, the wildboottest() function often takes 12 seconds to run the first time, and it was a struggle to get it that low.
    3. Nevertheless, the promise is real. A programmer can achieve much higher performance than with Mata, yet without having to bother with manually compiling code for multiple operating systems and CPUs, the way you do with C plug-ins for Stata. An example below shows WildBootTests 10x faster than boottest even when calling from Stata.
    4. Julia could be more directly integrated into Stata, making the link easier and more reliable. I've already suggested that Stata corp do this, the way they have for Python. Or maybe a user could lead the way, as James Fiedler did for Python.
    Here is the example. This does not demonstrate how to install Python, Julia, PyJulia, and WildBootTests, just how to run them once installed. The data set is used in Fast and Wild.

    Code:
    infile coll merit male black asian year state chst using regm.raw, clear
    qui xi: regress coll merit male black asian i.year i.state, cluster(state)
    generate individual = _n  // unique ID for each observation
    
    timer clear
    
    timer on 1
    boottest merit, nogr reps(9999) bootcluster(individual)  // subcluster bootstrap
    timer off 1
    
    timer on 2
    mat b = e(b)[1,1..colsof(e(b))-1]  // drop constant term
    global vars: colnames b            // get right-side variable names
    
    python
    from julia import WildBootTests as wbt
    import numpy as np
    from sfi import Data
    
    R = np.concatenate(([1], np.zeros(`=colsof(b)'))).reshape(1,-1)      # put null in Rβ = r form
    r = np.array([0])                                              
    resp = np.asarray(Data.get('coll'))                                  # get response variable
    predexog = np.c_[np.asarray(Data.get('$vars')), np.ones(resp.size)]  # get exogenous predictor variables + constant
    clustid = np.asarray(Data.get('individual state')).astype(int)       # get clustering variables
    test = wbt.wildboottest(R, r, resp=resp, predexog=predexog,
           clustid=clustid, nbootclustvar=1, nerrclustvar=1, reps=9999)  # do test
    wbt.teststat(test)                                                   # show results
    wbt.p(test)      
    wbt.CI(test)
    end
    timer off 2
    
    timer list
    On my machine, I get
    Code:
    . timer list
       1:     22.64 /        1 =      22.6360
       2:      2.10 /        1 =       2.1040
    ...meaning the new version is 10x faster.

    One source of speed-up is that by default wildboottest() does all computations in single-precision (32-bit floats) rather than double, something that is not possible in Mata, but I think is typically fine for a bootstrap-based test.
    Last edited by David Roodman; 27 Dec 2021, 11:18.

  • #2
    Thanks David for the post - this is amazing and a great idea respectively suggestion. It keeps bugging me that once you have a program in one language, converting it into another one takes time and is often not trivial. Using one language as a base is a great starting point, but the interaction between the programs is key as well. I really hope that the next Stata version comes with improved R and Julia interaction. This would be a game changer for users and programmers.

    One question/suggestion: I believe you could wrap the python code into a Stata program. For example you would code:

    Code:
    boottest merit, nogr reps(9999) bootcluster(individual) julia 
    where the option julia then computes the python code from your example. This would make the integration into Stata even easier.

    Comment


    • #3
      Hi Jan. I quite agree. I had thought of making a separate "boottestjl" commad for Stata, but I like your idea of just adding an option to boottest. I'll put it on my list...

      Comment


      • #4
        JanDitzen, done!

        Comment

        Working...
        X