Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing a computer for Stata: Cores or CPU speed?

    Compared to most on this forum, I have only a rudimentary understanding of What Makes Computers Go. But I've started doing a lot of mixed-effects models with meqrlogit and golly they take forever to run -- I've had runs take a week, during which I can't do any other Stata commands. I have funding to buy a new computer (a desktop). So, how do I choose?

    I'm currently running Stata/MP 13.1 on a 2.9GHz dual-core i7 Macbook Pro with 8GB RAM and I know (...think I know) it isn't a RAM issue since all my datasets are about 1GB only.
    • Will going up to 4 cores effectively double my processing speed? 6 cores, triple?
    • Would going up to a 6GHz dual-core (I don't think that exists, but for the sake of argument) double my processing speed?
    • Is it worth paying $3000 for a 4GHz quad-core, or is that likely to provide only a "clinically insignificant" improvement in speed (i.e., let's say 20% faster would be clinically insignificant, for my purposes) compared with a 2.6GHz quad-core for half the price (16GB RAM for both)?
    (All of this assumes I will upgrade to Stata/MP for the appropriate number of cores.)

    Thanks in advance for any advice that can shed light on this dilemma. I'm trying to be respectful of my funding source -- acknowledging that my laptop is only 15 months old and I'm in an institution where frugality is important! -- but do need to be able to make progress in my work.

  • #2
    Hello Liz,

    I'm stuck on my own research and saw your topic with questions I've asked myself a long time ago.

    I think the following is quite reliable. The source is http://www.timberlake.co.uk/Stata/?id=335
    In a perfect world, software would run twice as fast on two cores, four times as fast on four cores, eight times as fast on eight cores, and so on. Across all commands, Stata/MP runs 1.6 times faster on two cores, 2.1 times faster on four cores, and 2.7 times faster on eight cores. These values are median speed improvements. Half the commands run even faster.
    On the other side of the distribution, a few commands do not run faster, often because they are inherently sequential, such as time-series commands.
    Stata worked hard to make sure that the performance gains for commands that take longer to run would be greater. Across all estimation commands, Stata/MP runs 1.8 times faster on dual-core computers, 2.8 times faster on quad-core computers, and 4.1 times faster on computers with eight cores.

    At the moment, I'm searching for more relevant info for you.
    Last edited by Victoria Rogers; 17 Oct 2014, 17:49.

    Comment


    • #3
      In general, it's useful to increase the amount of cores when you're planning to run multiple programs at the same time.
      Increasing the processor speed would be the best for faster calculations.
      8.794 GHz is the highest possible

      It's also possible to increase your
      2.9GHz with overclocking, however, I don't recommend that because it involves certain risks if you don't increase the speed gradually
      Last edited by Victoria Rogers; 17 Oct 2014, 18:07.

      Comment


      • #4
        Also see http://blog.stata.com/2011/04/07/mul...rallelization/ .

        My limited experience based on disabling one core yields similar results: the second core doesn't double the speed but makes Stata run roughly 1.8 times faster.
        David Radwin
        Senior Researcher, California Competes
        californiacompetes.org
        Pronouns: He/Him

        Comment


        • #5
          Thanks all -- sounds like increasing the number of cores yields substantial improvement. But how does that compare to the improvements an increase in processor speed would produce?

          I don't think overclocking is something I will be doing -- ahem, Mac user -- but the i7 processors do do "hyper-threading". Sounds fancy, maybe it's helpful...?

          Comment


          • #6
            See http://www.stata.com/statamp/statamp.pdf for a pretty good explanation of the impact of multiple cores. Based on the tables they present, and making vague assumptions about how meqrlogit works, assume 50% parallelization (which could be a totally wrong assumption).

            Hyper-threading has been around for about a decade, and an I-5 does it just as well as your I-7. In my limited knowledge, it doesn't have an impact on Stata, or if it does, all machines since about 2005 get the same boost.

            One thing I wonder about is RAM. With mixed/fixed-effects models, even if the original data is 1 GB, in the background, it needs to manipulate huge matrices that can be much, *much* larger than the original data. I'm a PC guy, so don't know off the top of my head how to monitor RAM usage on a Mac. But check RAM usage. If it is maxing out RAM, then, of course, one thing to do is get a machine with more RAM. In addition to that, consider getting an SSD-based hard drive -- when it hits the wall on available memory and falls back on the hard drive, SSD is dramatically faster,
            Last edited by ben earnhart; 18 Oct 2014, 00:21.

            Comment


            • #7
              Looking from a different angle: I had a similar problem, and asked Statalist. Clyde Schechter and Richard Williams pointed to the -iterate()- option (see -help maximize-). It did not solve the problem, but it made it managble.

              Comment


              • #8
                Ben, I think you're so right about the RAM, I hadn't considered the matrix issues and it looks like (currently running an meqrlogit command) I have 600MB RAM free, of 8GB, so I suspect this is more RAM-heavy than I'd realized. Of course, there's no way to know whether 16GB would allow the whole matrix to be in RAM, so maybe an SSD hard drive for the active directory and a peripheral (since, yeah, Macs probably don't have the option for both an internal SSD and an internal disk-based HD) traditional HD?

                Anybody with thoughts about processor speed & the utility of increasing processor speed somewhat for a considerable increase in cost?

                Comment


                • #9
                  I wouldn't spend extra $ on an upgraded processor, *unless* the processor had a majorly increased cache size. GHz scale pretty much exactly as the #s go up: 10% increase in GHz=10% increase in speed (assuming RAM or hard drive are not the bottleneck). Increased cache, on the other hand, can have an unpredictable and sometimes dramatic impact on performance. Looking at the specs on Mac Pros, going from 12 MB cache to 25 MB cache *might* be worth it if you can afford it. But, the impact of the cache is unpredictable (unless you had a true expert running simulations with your particular procedure), so it might be money well spent, or a waste. Pure GHz improvements probably are wasted $, but if the faster processor increases cache size in a noticeable way, might be worthwhile. Sorry for being wishy-washy on this, but it depends on unknown factors.
                  Last edited by ben earnhart; 18 Oct 2014, 13:33.

                  Comment


                  • #10
                    I think the shortest answer may be both. The processor speed is going to be the limit for the amount of work the CPU can accomplish in a given amount of time, so if you're doing a lot of computationally intense analyses it would be worth the speed bump. If you have more processors of high speed along with an appropriately parallelized version of Stata you would probably notice some substantial gains in performance. For example, I had run a program I wrote to produce a series of seven multilayered Scatterplots for approximately 890 distinct units of interest. When I ran that using Stata 12MP2 on my desktop (PC) or my laptop (an older dual core MacBook Pro) it took 6-8 hours to complete (keep in mind that it is creating several thousand PDFs). When I ran the same thing in production with Stata 13MP8 on an underpowered Windows VM it took about 3 hours. Some of it also has to do with other processes running in the background. I can say from keeping tabs on the various process and RAM monitors on my newer macbook (2.8Ghz quad core) that hyperthreading definitely helps (the activity monitor will start displaying 8 distinct processor monitors even though the computer only has four physical CPUs). One other thing that could help is to estimate your model with a small number of iterations (say 5) and then use those parameter estimates as the starting values for subsequent models. And if you're able to, purge the RAM cache before running anything that intense.

                    Comment


                    • #11
                      Any thoughts on what the optimal number of MP cores would be for Stata when running a dual-core i7 (4600U) with hyperthreading? Obviously, there are two physical cores, so that would suggest MP2 but with hyperthreading there are also two virtual or logical cores. Would it be beneficial to get MP4 in that case? Would Stata MP4 take advantage of those virtual cores in a way that improves performance over MP2?
                      Last edited by Craig.Hayward; 09 Jan 2015, 11:35.

                      Comment


                      • #12
                        Responding to my own question, I found this page: http://www.stata.com/products/compat...ng-systems-mp/

                        Which says - "~~Be aware of the term “hyperthreaded”, however. Stata/MP runs faster on hyperthreaded processors, but not as fast as it would if you had full cores instead of hyperthreads. Computers with multiple hyperthreaded processors are suitable for Stata/MP. The number of real processors is the critical factor."

                        Comment


                        • #13
                          See also: http://repec.org/bos2014/boston14_radyakin.pdf

                          Go for more MHz/GHz first. All commands will run faster. And you can keep the same Stata license.
                          Plugins are not thread-safe, and thus usually work in a single thread, thus not enjoying MP benefits. It is possible to write a mixed plugin, but who would bother?

                          Twice more GHz doesn't mean twice more speed. Fast CPU may be idling, waiting for the memory to store results or HDD to save/load data.
                          Desktops generally have more powerful CPUs then laptops and have more upgrade options.

                          Note, that for a non-MP Stata overall performance of the CPU is usually misleading. A specialized single-thread performance rating is more relevant:
                          https://www.cpubenchmark.net/singleThread.html

                          As of Jan10,2015 the only 4GHz CPU in the rating topped the nearest competition by some 8%.
                          As the table illustrates, more GHz is not necessarily faster performance - internal efficiency, cache and other factors matter.

                          Little is known about Stata itself. I suspect that even a non-MP version may still be multi-threading, and thus may benefit from more cores.

                          You are rarely running Stata alone. In practice this means: never. Check your process manager. It is likely to report some 50+ processes and services running right now. Adding more cores will move those parallel processes to different cores and reduce competition for the core occupied by Stata.

                          Best, Sergiy Radyakin

                          Comment


                          • #14
                            Here's some advice in a different direction: Take advantage of the capacity of your machine to put your Big Job in the background, or give it reduced priority so that you can do your other work.

                            Long version:
                            Even a fast machine with lots of cores (at least one a mere mortal can afford) won't make a one week job run in a few hours. If you experienced a speed-up of 10X, I would be very surprised. So, you will continue to have the need to run Stata, and other programs, while your Big Job is running. Even with your current machine and OS, your comment that you can't do other work in Stata is likely wrong. I don't have Mac experience, but in the Windows or UNIX world, you could open up a new instance of Stata and do work in it while your original job is running. To reduce the competition for CPU resources between your new instance and your old one, you would reduce the priority given to the Big Job, do your work in the new instance of Stata, and then change the priority of the original Big Job back to something high and let it run. (Sorry, I can't tell you how to do this on the Mac, but I'm sure someone else can; I can't imagine it's hard.) This same advice applies more generally to competition between your Big Job and *all* your other computer work for the day or week.

                            Regards, Mike

                            Comment

                            Working...
                            X