Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MP and speed as a function of processors

    After some years of not upgrading from Stata 15 (MP 2), I was planning to move to version 19 while continuing with a 2 processor version, but in an idle moment I decided to compare performance with 1 processor vs. 2. One of my main desires for speed is while using -bootstrap-, so I tried this:

    Code:
    sysuse auto, clear
    expand 10
    quiet replace price = price + rnormal(0,100)
    set processor 2 // or 1
    timer clear 1
    timer on 1
    bootstrap _b, reps(5000) nodots: regress price headroom weight length turn
    timer off 1
    timer list 1
    To my surprise, there was almost no difference with 1 processor vs. 2 (about 10 sec. for the above on my machine). With a larger data set (-expand 100-), -set processor 2- was only about 20% faster. The relative difference was similar when I did a -logit- rather than -regress-

    This makes me wonder if I get much benefit by continuing with a 2 processor version of Stata 19.

    Some thoughts:

    1) Perhaps -bootstrap- is just not the sort of command that benefits from multiple processors?

    2) Perhaps -set processor 1- vs. 2 doesn't reveal the advantages of a two processor version as would actually running Stata SE vs. say MP 2?

    3) Maybe something here is peculiar to my machine (Windows with 12 cores)?

    I'd like to hear what others' experience and perspective on this would be as regards the advantages of a multiple vs. single core Stata version.

  • #2
    Dear Mike,
    1. Actually, bootstrapping should be very easy to run in parallel. I am not sure why your results are not better with 2 core used. I remember this interesting benchmark: https://www.stata.com/statamp/perfor...ort/report.pdf Unfortunately, it only tests the new cluster bootstrap if I see this correctly.
    2+3 i dont know. In any case, the ado parallel will do the job on any(!) Stata version for free, at least for bootstrapping.
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Mike,

      I agree with Felix. parallel is quite useful for cases such as yours and works for Stata versions 13 and up. See below on Stata 16 (MP 4):
      Code:
      clear*
      sysuse auto
      expand 10
      quiet replace price = price + rnormal(0,100)
      
      set processor 1
      timer clear 1
      timer on 1
      bootstrap _b, reps(5000) nodots: regress price headroom weight length turn
      timer off 1
      
      set processor 2
      timer clear 2
      timer on 2
      bootstrap _b, reps(5000) nodots: regress price headroom weight length turn
      timer off 2
      
      parallel initialize 2
      timer clear 3
      timer on 3
      parallel bs, reps(5000) nodots: regress price headroom weight length turn
      timer off 3 
      
      parallel initialize 4
      timer clear 4
      timer on 4
      parallel bs, reps(5000) nodots: regress price headroom weight length turn
      timer off 4 
      
      timer list
      Leading to:
      Code:
      1:     13.49 /        1 =      13.4870
      2:     13.01 /        1 =      13.0060
      3:      9.32 /        1 =       9.3170
      4:      6.12 /        1 =       6.1200
      So even when you set the number of processors to 2 in Stata, you get a very slight boost in speed but it is not truly running in parallel (i.e., on multiple processors). For that, you need to use parallel.

      Comment


      • #4
        Interesting. So, if I had Stata mp/4, how would the times compare? If nothing else, I assume the syntax would be a little simpler?
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Thank for the responses and the link to the benchmark material. Yes, I have used -parallel- or run multiple instances of Stata with large bootstrap problems or similar tasks, and that works great. It sounds like the parallelization built in to Stata occurs essentially within commands and not "across" commands, as in -bootstrap-. But I'm still puzzled as to why using more processors doesn't produce benefits from speeding up each execution of e.g. -regress-. That benchmark document indicates that -regress- is 100% parallelized. I'm thinking now that MP has not done much for me with the kinds of tasks I typically do. This is not a complaint, just an observation.

          Comment


          • #6
            Considering the information here:

            https://www.stata.com/statamp/perfor...ort/report.pdf

            your experience is very surprising. The regress comand is linear in the number of processors in those tests. Note that you are going from 12 processors to 24, not 1 to 2, but that doesn't remove the surprise.

            Running multiple jobs is a good suggestion. The bstat command will combine the results:

            https://www.stata.com/manuals/rbstat.pdf
            https://www.nber.org/stata/efficient/bootstrap.html

            Have you looked at Roodman et al
            Fast and Wild: Bootstrap Inference in Stata using boottest
            https://www.stata.com/meeting/canada...ada18_Webb.pdf

            which promises order of magnitude improvements in speed for appropriate problems.

            Comment


            • #7
              Here's an even simpler example, with results, comparing time consumed by -regress- using 1 vs. 2 processors. This looks like using multiple processors makes no appreciable difference, as opposed to the apparently careful analyses on p.37 of https://www.stata.com/statamp/performance-report/report.pdf reporting that using 2 processors under MP was more than twice as fast as using 1 processor. Again, I'm using v. 15.1 of Stata, MP2, and perhaps this would be different with v.19. Nevertheless, I wonder why my simple results show so little effect, and of course whether they are relevant to choosing MP 2 vs. SE.

              Code:
              . forval j = 1/2 {
              2. quiet set processors `j'
              3. timer clear 1
              4. timer on 1
              5. forval i = 1/10000 {
              6. quiet replace price = price + rnormal(0,1000) // avoid any cached memory oddities
              7. quiet regress price headroom weight length turn
              8. }
              9. timer off 1
              10. di "Using `j' processors"
              11. timer list 1
              12. }
              Using 1 processors
              1: 52.42 / 1 = 52.4170
              Using 2 processors
              1: 49.50 / 1 = 49.5030

              Comment


              • #8
                There is a prior thread on this topic. I think it would be helpful if people ran and reported the results of the benchmark program written by Sergiy Radyakin, who wrote the original post in that thread.
                Associate Professor of Finance and Economics
                University of Illinois
                www.julianreif.com

                Comment

                Working...
                X