Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which Stata/MP version is right for me?

    Dear all,

    It seems that neither Stata/BE nor SE are able to the regressions I would like to do, at least not in an appropriate time. I have a panel dataset (2000-2015, with 5 year distance) with about 2 million observations, which is not the problem. The matter seems to bet that I am running the rather new -acreg- regressions with rather complex distance cutoffs. I tried this regressions with BE (which took over a week) and SE which I aborted after a week. Now I am considering to buy the Stata/MP version for students. However, I don't really know if the 2 core version is enough or not. I am running on 2,2 GHz Quad-Core Intel Core i7 MacBook Pro from 2015 with 16 GB 1600 MHz DDR3 Memory.

    Any help would be appreciated.
    Last edited by Michael Schuster; 04 Jan 2022, 09:15.

  • #2
    i don't know how old you are or how far along in your education you are, but I'll tell you as my mentor told me in July of last year:
    You're a Ph.D student now. You need the Big Boys Stata.
    So I sucked it up and bought 4 MP, because I plan to be a researcher and econometrician all of my adult life.


    So, if you really need the 4 MP to do what you want, get that. It'll be an investment in the long run.

    Comment


    • #3
      I love the quote :D

      Yet, I might ending up with just using it for my master's thesis. That's why I am struggling with my decision. Is there a big difference between SE and 2 MP as well as between 2 and 4 MP in the running time?

      Comment


      • #4
        Uhhhhhhhhhhhhhhhhhhh.... In my experience, yes. It's been a while since I've used IC or SE, but yes there's quite the difference when it counts.

        For what it's worth though, my code runs quickly not just because of that I have MP, but because I program efficiently (in my opinion, anyways) and augment user written commands with extensions. So, if there's a command that does collapsing or merging, I always augment these with ftools or gtools, user written commands from SSC that allow for these commands to happen at ridiculous speeds. For example, re-shaping a dataset of even a few million observations takes like 5 minutes with Stata's reshape command. With greshape it literally takes like 5 seconds, and I'm barely exaggerating.

        So yes, your software will play a big part in run time, but also other things like having your variables in the correct format and all that stuff. I know I encountered the same issue with my masters thesis, so getting MP 4 may help you in the long run assuming you're going to be a long term researcher.

        Comment


        • #5
          Keep in mind that along with an insufficiently powerful version of Stata for your purposes, you are also running Stata on a 6-year-old MacBook Pro built on Intel hardware, and significant performance gains could be expected from expected from the newest models built on Apple silicon. And it is my understanding that the newest models restore many of the features of your MacBook Pro that went missing from the intervening models.

          Comment


          • #6
            Jared Greathouse Re #4. While I share your enthusiasm for greshape (or -tolong-, another user-written reshape command I use frequently), there is the problem of sometimes also using official or user-written Stata program that themselves call the official Stata -reshape- command. Unless you are comfortable with , and have the time for, hacking through those programs to make them also call a faster version of reshape, you can't reap the full benefit. I honestly do not understand why StataCorp has not updated the official -reshape- command to make it more speedy. It's not as if they don't know that -reshape- is slow. A request for speeding it up has been in the Wish List threads previously, and I imagine their tech support must get complaints, or problems where a really long -reshape- is misinterpreted as Stata just hanging. Oh, well!

            Comment


            • #7
              Clyde Schechter Yeah I totally agree. Me, I'm kind of a speed demon. I'll just use the user written adoedit to go into a complicated command and, if it's feasible, make slight alterations to specific parts of it assuming it'll save me the time. Of course as you mention, not everyone has the time or frankly the desire to fiddle with Stata or user written commands.

              I also agree on StataCorp's interesting priorities. You've likely seen my enthusiasm for causal analysis and the like; everyone has their own projects they wish Stata would handle themselves for ease of having them take care of it, but honestly, the first time I had to reshape COVID-19 data from JHU, it took 10 minutes maybe. You'd think given that they know this, that a fix of some sort would be readily fixed even in updates.

              A simple Google search even tells us that Stata's native reshape is pretty slow; I don't mind that StataCorp doesn't write their commands around me, but you'd think for basic procedures like reshape or egen or others, someone in meetings would say "Hey, excuse me, maybe it's time we make this 15 year old command a little faster."

              Comment


              • #8
                When you have big jobs, it is useful to do a few preliminary runs with subsets and extrapolate to the time required for the full dataset. It may be that the time required makes the inital approach desired infeasible. Note that times may be quadratic or cubic in the number of variables, even if they are propotionate to the number of observations.

                Are you sure you are not running out of memory? Some Stata commands use lots of additional memory. Let the OS tell you about memory usage (Activity Monitor in OSX, Task Manager in Windows) while Stata is running. Paging can make any command so slow it may not finish in finite time. More cores won't help that.

                I have had some success modifying Stata supplied .ado files. Just be sure to change the name of the command so you don't get a surprise down the road. The "g" commands are great. Worth a try, certainly.

                Multiple cores are good, but Amdahl's law limits the benefit, and -reshape- is limited by memory bandwidth. Nevertheless, I just did a test and 8 cores ran in 150 seconds where 1 core took 180 seconds. Regression commands do much better with multiple cores.

                See also https://back.nber.org/stata/efficient/

                Comment


                • #9
                  [email protected] Thank you for your reply.

                  "Normal commands" (non-regression commands) run very fast with BE as well as SE and I have all my data prepared before I merged all datasets. Thus, I basically just need to run the regressions on the final dataset. It does seem that the dataset is not too big, but as mentioned the regression command -acreg- is just to complex for the size of the dataset. I forgot to mention, that when I tried the SE version I was running on a Windows with Intel Xeon Gold 6248 CPU 2.5GHz (8 core) and 64GB Memory. I am asking for my understanding: Stata/BE and SE both run just on one core no matter how many cores there are on the running machine, right?

                  Comment


                  • #10
                    I am asking for my understanding: Stata/BE and SE both run just on one core no matter how many cores there are on the running machine, right?
                    Yes, that is correct.

                    Even when you go to MP, Stata will run on the number of cores your license authorizes, or the number of cores available on the machine, whichever is less. For example, my computer has 8 cores, but my Stata license is for 4-cores, so Stata will run only on 4 cores on my machine. If I were to port my Stata to a two core computer, it would run on those two cores.

                    Comment

                    Working...
                    X