Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about speed of reg versus reghdfe

    Hi Stata,

    I am using STATA MP 8 core version on Mac (RAM 16G) and running a regression with FEs. The regression for using reghdfe is like:

    reghdfe IHS ib7.group1event ib7.group2event ib7.group3event ib7.group4event ib7.group5event ib7.group6event ib7.group7event ib7.group8event ib7.group9event [aw=population], absorb(i.week i.group i.state) cluster(state)

    The data I am using has 20 million obs. I have managed to compress it to the size of about 300M. However, reghdfe causes out of memory issue and my system asks me to quit Stata (I have tried options: poolside(1) compact, still out of memory). Even when I run reghdfe on 1/10 of the data (about 2 million), it takes about 10 mins to finish. But when we I do the same regression using reg command with the full sample, it returned results in 30 seconds. Any intuition of what is going on here? I thought reghdfe is designed to make regression with many FEs faster, but it doesn't look like that here. I appreciate any thoughts.



  • #2
    Quick question: have you tried to run -reghdfe- with Stata in a batch model? E.g., using terminal?

    Comment


    • #3
      Originally posted by Tiago Pereira View Post
      Quick question: have you tried to run -reghdfe- with Stata in a batch model? E.g., using terminal?
      Hi Tiago, Thanks for replying. I am not familiar with the batch model. Could you explain a little more? thanks!

      Comment


      • #4
        Hi, Austin.

        1. Save you do-file. Let's call it script.do and let's assume you save it on the following path /home/Desktop/my_folder
        2. You need to know where the Stata executables are on your computer. Let's assume they are on the following path /usr/local/stata16/
        3. Open the terminal and type


        Code:
        cd /home/Desktop/my_folder
         /usr/local/stata16/stata-mp do "script.do"

        Stata will be run on the "batch" mode, and, at least based on my humble experience, that approach can handle better larger datasets.
        Last edited by Tiago Pereira; 13 Feb 2022, 17:57.

        Comment


        • #5
          Any suggestions/thoughts on why reg is faster than reghdfe in this case?

          Comment

          Working...
          X