MP and speed as a function of processors

Mike Lacy

Join Date: Apr 2014

Posts: 2403
#1

MP and speed as a function of processors

11 Apr 2025, 18:15

After some years of not upgrading from Stata 15 (MP 2), I was planning to move to version 19 while continuing with a 2 processor version, but in an idle moment I decided to compare performance with 1 processor vs. 2. One of my main desires for speed is while using -bootstrap-, so I tried this:

Code:

sysuse auto, clear expand 10 quiet replace price = price + rnormal(0,100) set processor 2 // or 1 timer clear 1 timer on 1 bootstrap _b, reps(5000) nodots: regress price headroom weight length turn timer off 1 timer list 1

To my surprise, there was almost no difference with 1 processor vs. 2 (about 10 sec. for the above on my machine). With a larger data set (-expand 100-), -set processor 2- was only about 20% faster. The relative difference was similar when I did a -logit- rather than -regress-

This makes me wonder if I get much benefit by continuing with a 2 processor version of Stata 19.

Some thoughts:

1) Perhaps -bootstrap- is just not the sort of command that benefits from multiple processors?

2) Perhaps -set processor 1- vs. 2 doesn't reveal the advantages of a two processor version as would actually running Stata SE vs. say MP 2?

3) Maybe something here is peculiar to my machine (Windows with 12 cores)?

I'd like to hear what others' experience and perspective on this would be as regards the advantages of a multiple vs. single core Stata version.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 660
#2

11 Apr 2025, 23:43

Dear Mike,
1. Actually, bootstrapping should be very easy to run in parallel. I am not sure why your results are not better with 2 core used. I remember this interesting benchmark: https://www.stata.com/statamp/perfor...ort/report.pdf Unfortunately, it only tests the new cluster bootstrap if I see this correctly.
2+3 i dont know. In any case, the ado parallel will do the job on any(!) Stata version for free, at least for bootstrapping.

Best wishes

(Stata 16.1 MP)
Comment

Erik Ruzek

Join Date: Oct 2017
Posts: 408

12 Apr 2025, 08:43

Mike,

I agree with Felix. parallel is quite useful for cases such as yours and works for Stata versions 13 and up. See below on Stata 16 (MP 4):

Code:

clear*
sysuse auto
expand 10
quiet replace price = price + rnormal(0,100)

set processor 1
timer clear 1
timer on 1
bootstrap _b, reps(5000) nodots: regress price headroom weight length turn
timer off 1

set processor 2
timer clear 2
timer on 2
bootstrap _b, reps(5000) nodots: regress price headroom weight length turn
timer off 2

parallel initialize 2
timer clear 3
timer on 3
parallel bs, reps(5000) nodots: regress price headroom weight length turn
timer off 3 

parallel initialize 4
timer clear 4
timer on 4
parallel bs, reps(5000) nodots: regress price headroom weight length turn
timer off 4 

timer list

Leading to:

Code:

1:     13.49 /        1 =      13.4870
2:     13.01 /        1 =      13.0060
3:      9.32 /        1 =       9.3170
4:      6.12 /        1 =       6.1200

So even when you set the number of processors to 2 in Stata, you get a very slight boost in speed but it is not truly running in parallel (i.e., on multiple processors). For that, you need to use parallel.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 4932
#4

12 Apr 2025, 09:02

Interesting. So, if I had Stata mp/4, how would the times compare? If nothing else, I assume the syntax would be a little simpler?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2403
#5

12 Apr 2025, 20:34

Thank for the responses and the link to the benchmark material. Yes, I have used -parallel- or run multiple instances of Stata with large bootstrap problems or similar tasks, and that works great. It sounds like the parallelization built in to Stata occurs essentially within commands and not "across" commands, as in -bootstrap-. But I'm still puzzled as to why using more processors doesn't produce benefits from speeding up each execution of e.g. -regress-. That benchmark document indicates that -regress- is 100% parallelized. I'm thinking now that MP has not done much for me with the kinds of tasks I typically do. This is not a complaint, just an observation.
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#6

Yesterday, 04:51

Considering the information here:

https://www.stata.com/statamp/perfor...ort/report.pdf

your experience is very surprising. The regress comand is linear in the number of processors in those tests. Note that you are going from 12 processors to 24, not 1 to 2, but that doesn't remove the surprise.

Running multiple jobs is a good suggestion. The bstat command will combine the results:

https://www.stata.com/manuals/rbstat.pdf
https://www.nber.org/stata/efficient/bootstrap.html

Have you looked at Roodman et al

Fast and Wild: Bootstrap Inference in Stata using boottest

https://www.stata.com/meeting/canada...ada18_Webb.pdf

which promises order of magnitude improvements in speed for appropriate problems.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2403
#7

Yesterday, 10:28

Here's an even simpler example, with results, comparing time consumed by -regress- using 1 vs. 2 processors. This looks like using multiple processors makes no appreciable difference, as opposed to the apparently careful analyses on p.37 of https://www.stata.com/statamp/performance-report/report.pdf reporting that using 2 processors under MP was more than twice as fast as using 1 processor. Again, I'm using v. 15.1 of Stata, MP2, and perhaps this would be different with v.19. Nevertheless, I wonder why my simple results show so little effect, and of course whether they are relevant to choosing MP 2 vs. SE.

Code:

. forval j = 1/2 { 2. quiet set processors `j' 3. timer clear 1 4. timer on 1 5. forval i = 1/10000 { 6. quiet replace price = price + rnormal(0,1000) // avoid any cached memory oddities 7. quiet regress price headroom weight length turn 8. } 9. timer off 1 10. di "Using `j' processors" 11. timer list 1 12. } Using 1 processors 1: 52.42 / 1 = 52.4170 Using 2 processors 1: 49.50 / 1 = 49.5030
Comment
Julian Reif

Join Date: Dec 2018

Posts: 47
#8

Yesterday, 12:23

There is a prior thread on this topic. I think it would be helpful if people ran and reported the results of the benchmark program written by Sergiy Radyakin, who wrote the original post in that thread.

Associate Professor of Finance and Economics
University of Illinois
www.julianreif.com
2 likes
Comment

Announcement

MP and speed as a function of processors

Comment

Comment

Comment

Comment

Comment

Comment

Comment