Bootstrapping in parallel

paulvonhippel

Join Date: Apr 2014

Posts: 502
#1

Bootstrapping in parallel

31 Jan 2022, 18:11

Using Stata, is there any convenient way to bootstrap in parallel? Although Stata/MP does some things in parallel, it seems it still bootstraps in parallel. In 2018, jerome falken reported some trouble using the -parallel bs- command by George Vega and Brian Quistorff, which I am also having trouble with. Maarten Buis offered a solution, but the solution pertained to simulation rather than bootstrapping.

Has there been any progress since 2018? Bootstrapping is an embarrassingly parallel task, and it is becoming a little embarrassing if it can't be conveniently parallelized in Stata.

(Here's the 2018 discussion of this issue: https://www.statalist.org/forums/for...l-bootstraping

Last edited by paulvonhippel; 31 Jan 2022, 18:34.
Tags: None

1 like
John Schawrz

Join Date: Nov 2019

Posts: 30
#2

31 Jan 2022, 23:32

Interested as well.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 693
#3

01 Feb 2022, 00:44

For me parallel works usually very well. I think it will be easier to look at the specific error / problem and see why parallel fails for you instead of rewriting the entire code or inventing a new ado. If you encounter specific bugs, please report them to the programmers.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#4

01 Feb 2022, 09:32

This morning, -parallel bs- seems to be working, but it doesn't save much runtime. For the analysis I'm running, with 32 bootstrap replicates, I'm finding that -parallel bs-, which splits the job across 4 or 8 clusters, runs only 7 to 20 percent faster than -bootstrap- which runs serially. Should I be disappointed? I was hoping the parallel version would run almost 4-8 times faster than the serial version. My computer has 4 physical cores and 8 logical processors.

Last edited by paulvonhippel; 01 Feb 2022, 09:35.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

01 Feb 2022, 09:47

Two possibilities come to mind.

1) The overhead in setting up each cluster outweighs much of the speed gains.

2) To the extent that the command you are bootstrapping is itself well parallelized, then each cluster set up by -parallel bs- will be competing for the same cores to execute the parallelized command. Or else -parallel bs- takes this into consideration and reduces the number of cores available to each cluster, which would reduce the performance of the command being bootstrapped.

At a higher level, this may well be why bootstrap is not parallelized - there's a tradeoff between running serially with the bootstrapped command having full access to all the cores, and running parallel replications of the bootstrapped command, with each replication fighting the others for time in the cores.
2 likes
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#6

01 Feb 2022, 10:03

William Lisowski : Thanks for your thoughtful reply. I looked into the possibilities you raised:

1) I'm not sure what kind of overhead is involved in setting up a core, but if the issue is overhead, then the advantage of -parallel, bs- should increase with the number of bootstrap replications. Accordingly I tried going from 32 replications to 64, and the advantage was no greater. I'm waiting about 20 minutes for output, which I would think is a long time compared to any overhead.

2) The command I'm running is -xtpoisson-, which according to the MP documentation runs 5 to 7 times faster on 4 to 8 cores than it does on a single core. So that seems like the more likely possibility. Both -bootstrap: xtpoisson- and -parallel bs: xtpoisson- split the job up across cores. Hence not much difference in runtime.

Good insights, thank you!

I still feel the job is running too slow, and I've been thinking about why. I have new questions:
I wonder if -bootstrap- is filling up memory by generating all the bootstrap samples at once, instead of one at a time.

As it's common to run several variations of the same regression model, with different sets of covariates, it would be more efficient to generate each bootstrap sample once and run all the models on that sample before generating a new one. I don't see a convenient way to implement that in -bootstrap-, but it might save some time. Although I doubt the generation of bootstrap samples is the slow part.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#7

01 Feb 2022, 10:17

In the literature you can find recommendations that you should bootstrap your analysis hundreds or thousands of times. That's totally impractical with runtimes like I'm seeing. Much of the literature is written as though runtime doesn't matter.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 693
#8

01 Feb 2022, 10:27

Well, even with parallel and all tricks, Stata is quite slow in comparison to languages like R. I've heard that Stata 17 is faster but I think its still not even close. We will have to live with that or use another language. Regarding the question why parallel is not THAT much faster, well, William Lisowski gives good information. For simple examples like regress, the scaling is however not that bad. In this example code below the factor is about 3.4 (Stata 16.1 MP 2 cores). I suggest looking at the paper by the programmers for some more benchmarks.

Code:

clear all sysuse nlsw88 set seed 123 global command reg wage hours smsa south ttl_exp, vce(cluster race) timer on 1 bootstrap, reps(50000) nodots: $command timer off 1 timer list 1 timer on 2 parallel initialize 4 parallel bootstrap, reps(50000) nodots: $command timer off 2 timer list 2 qui timer list di r(t1) / r(t2)

https://journals.sagepub.com/doi/abs...36867X19874242

Last edited by Felix Bittmann; 01 Feb 2022, 10:30.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#9

01 Feb 2022, 10:34

Originally posted by paulvonhippel View Post

I still feel the job is running too slow, and I've been thinking about why. I have new questions:

I wonder if -bootstrap- is filling up memory by generating all the bootstrap samples at once, instead of one at a time.

As it's common to run several variations of the same regression model, with different sets of covariates, it would be more efficient to generate each bootstrap sample once and run all the models on that sample before generating a new one. I don't see a convenient way to implement that in -bootstrap-, but it might save some time. Although I doubt the generation of bootstrap samples is the slow part.

I think William's point cuts to the heart of the issue. Ultimately, there are limited resources that must be shared. Whether it is running K jobs in parallel across K cores, or one job serially on K cores, limitation is down to how many cores are licensed and available and the available memory.

To point #1, I don't think that's how bootstrap works. I think it draws one sample, and re-executes the command (fits the model) and saves relevant output, rinse and repeat. Since we are often re-estimating models, there would be much more memory overhead needed to run K jobs in parallel, rather than one job serially which can be optimized for multiple cores and therefore single execution time is reduced. Poisson models especially can hog a lot of memory (which you can watch if you inspect memory allocation during a large regression model). I suspect this is the reason why bootstrap works the way it does.

To point #2, it should be much faster to sample than it is to fit a model, and the model fitting should be the bottleneck in typical scenarios.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#10

01 Feb 2022, 16:01

Thanks, all! Felix Bittmann : when you say that Stata runs slower than R, what are you basing that on? Doesn't it depend on the State command (R package) and how it was written?
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#11

01 Feb 2022, 16:03

Leonardo Guizzetti : I'm not sure what you mean by "one job serially which can be optimized for multiple cores." How can it be serial if it runs on multiple cores?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#12

01 Feb 2022, 16:51

Originally posted by paulvonhippel View Post

Leonardo Guizzetti : I'm not sure what you mean by "one job serially which can be optimized for multiple cores." How can it be serial if it runs on multiple cores?

Sorry if that wasn't clear. Suppose that one operation can be performed as one monolithic calculation, or perhaps factored into several independent pieces and then resembled. I mean this latter type of operation. For example, many data manipulation tasks that are vectorized lend themselves well to a divide and conquer approach, which could be performed using multiple cores. Some matrix multiplication operations can lend themselves to parallel algorithms. Random sampling as another example.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#13

02 Feb 2022, 10:01

Thanks! Well, I ran everything and concluded that, although the results vary from one run to another, it doesn't seem that -parallel bs- is reliably faster than -bootstrap-. -parallel bs- was actually slower in some runs, and it doesn't play as well with other Stata commands. That's quite understandable for a user-written command. To get faster runtime from parallel bootstrapping, I would suggest that Stata implement something in-house...
1 like
Comment
shem shen

Join Date: Mar 2016

Posts: 136
#14

01 Feb 2024, 01:57

Is -parallel bs- faster than -bootstrap- now?
Comment

Announcement

Bootstrapping in parallel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment