In December I posted about a new Julia implementation of the wild bootstrap and demonstrated how to call it from Stata...by way of Python. In computationally demanding applications, it can be much faster.
JanDitzen made the excellent suggestion that I add a julia option to boottestto automate this link, which I have now done. It still requires (free) installation of both Python and Julia. (Julia must be installed so that it is accessible through the system path.). boottest tries to automate the set-up from there, installing Python and Julia packages as needed. Let me know if that part crashes on you. The latest version of boottestcan be installed with ssc install boottest, replace. Because it uses Python to connect to Julia, it requires Stata 16 or later.
The first time the Julia version is called in a Stata session, or called to run a new kind of test, such as the WCR after regress or the WRE after ivregress, there will be a significant delay as Julia compiles code "just in time." For the rest of the session, it should run fast. But if the regular version of boottest already is fast in your applications, the Julia version will not save time.
There is also a float(32) option, which tells the Julia implementation to use single-precision math, and often returns the same answer in less time. When generating random numbers in Julia, the StableRNGs.jl package is used in order to guarantee exact replicability.
I think this project is interesting as a model of unified cross-platform development. A complex procedure can be implemented and refined once, instead of de novo for R, Stata, Python, etc. And it can still be done in a platform-independent way, unlike C plug-ins for Stata, which the author has to compile separately for Windows, Linux, and Mac. Alexander Fischer is incorporating the same Julia back end, WildBootTests.jl, into fwildclusterboot for R.
If Stata Corp integrated Julia into Stata the way it has Python, then this model could become even more viable for Stata. I've worked to automate the Julia set-up, but because of the current need to link via Python and the complexity of supporting multiple operating systems in multiple configurations, I worry that it will sometimes fail, and would benefit from a professional touch.
And if Julia got to the point where the same just-(not)-in-time compilation didn't need to be run anew in every session, that would be even better...
Here is an annotated session performing a subcluster wild bootstrap on a data set from Tim Conley. set rmsg is turned on to show how many seconds each command takes:
JanDitzen made the excellent suggestion that I add a julia option to boottestto automate this link, which I have now done. It still requires (free) installation of both Python and Julia. (Julia must be installed so that it is accessible through the system path.). boottest tries to automate the set-up from there, installing Python and Julia packages as needed. Let me know if that part crashes on you. The latest version of boottestcan be installed with ssc install boottest, replace. Because it uses Python to connect to Julia, it requires Stata 16 or later.
The first time the Julia version is called in a Stata session, or called to run a new kind of test, such as the WCR after regress or the WRE after ivregress, there will be a significant delay as Julia compiles code "just in time." For the rest of the session, it should run fast. But if the regular version of boottest already is fast in your applications, the Julia version will not save time.
There is also a float(32) option, which tells the Julia implementation to use single-precision math, and often returns the same answer in less time. When generating random numbers in Julia, the StableRNGs.jl package is used in order to guarantee exact replicability.
I think this project is interesting as a model of unified cross-platform development. A complex procedure can be implemented and refined once, instead of de novo for R, Stata, Python, etc. And it can still be done in a platform-independent way, unlike C plug-ins for Stata, which the author has to compile separately for Windows, Linux, and Mac. Alexander Fischer is incorporating the same Julia back end, WildBootTests.jl, into fwildclusterboot for R.
If Stata Corp integrated Julia into Stata the way it has Python, then this model could become even more viable for Stata. I've worked to automate the Julia set-up, but because of the current need to link via Python and the complexity of supporting multiple operating systems in multiple configurations, I worry that it will sometimes fail, and would benefit from a professional touch.
And if Julia got to the point where the same just-(not)-in-time compilation didn't need to be run anew in every session, that would be even better...
Here is an annotated session performing a subcluster wild bootstrap on a data set from Tim Conley. set rmsg is turned on to show how many seconds each command takes:
Code:
. infile coll merit male black asian year state chst using regm.raw, clear (42,161 observations read) . . generate individual = _n // unique ID for each observation . . qui xi: regress coll merit male black asian i.year i.state, cluster(state) . set rmsg on r; t=0.00 8:13:58 . boottest merit, nogr reps(9999) bootcluster(individual) // without julia option Wild bootstrap-t, null imposed, 9999 replications, Wald test, bootstrap clustering by individual, Rademacher weights: merit t(50) = 2.6646 Prob>|t| = 0.0352 95% confidence set for null hypothesis expression: [.002728, .06488] r; t=21.94 8:15:06 . boottest merit, nogr reps(9999) bootcluster(individual) julia // with julia option: slow the first time called in a Stata session Wild bootstrap-t, null imposed, 9999 replications, Wald test, bootstrap clustering by individual, Rademacher weights: merit t(50) = 2.6646 Prob>|t| = 0.0319 95% confidence set for null hypothesis expression: [.003473, .06414] r; t=28.21 8:15:46 . boottest merit, nogr reps(9999) bootcluster(individual) julia // same command, faster now Wild bootstrap-t, null imposed, 9999 replications, Wald test, bootstrap clustering by individual, Rademacher weights: merit t(50) = 2.6646 Prob>|t| = 0.0319 95% confidence set for null hypothesis expression: [.003804, .06383] r; t=3.55 8:15:53 . boottest merit, nogr reps(9999) bootcluster(individual) julia float(32) // slow first time called in single precision Wild bootstrap-t, null imposed, 9999 replications, Wald test, bootstrap clustering by individual, Rademacher weights: merit t(50) = 2.6646 Prob>|t| = 0.0295 95% confidence set for null hypothesis expression: [.004034, .0636] r; t=15.78 8:16:15 . boottest merit, nogr reps(9999) bootcluster(individual) julia float(32) // second time, even faster Wild bootstrap-t, null imposed, 9999 replications, Wald test, bootstrap clustering by individual, Rademacher weights: merit t(50) = 2.6646 Prob>|t| = 0.0305 95% confidence set for null hypothesis expression: [.003753, .06379] r; t=2.63 8:16:21
Comment