Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata 17 crashes during large computation

    I am a Research Computing Facilitator at FASRC. Raul Duarte reached out to our support because he was running a Stata code with the function iebaltab from the ietoolkit library on our cluster and the job was dying midway through computation. We troubleshot extensively without much progress. We have reached out to the developers of iebaltab (see Github issue https://github.com/worldbank/ietoolkit/issues/368) thinking it was an issue related to the function itself, but we found that the function works with a smaller dataset. The iteoolkit developers suggested reaching out to Stata instead.

    Unfortunately, because Raul’s data cannot be shared (because of a Data Use Agreement [DUA] signed), we cannot share the data, but we will try to explain as much as possible. This a summary of our findings so far (some copy and paste from Github issue so this post has more context):

    Computational environment
    • OS: Rocky Linux 8.9
    • Hardware (for more details, see https://docs.rc.fas.harvard.edu/kb/f...and_Partitions):
      • fasse_bigmem partition: Intel Ice Lake chipset, 499 GB of RAM, /tmp space is 172 GB
      • fasse_ultramem partition: Intel Ice Lake chipset, 2000 GB of RAM, /tmp space is 396 GB
    • Stata: version 17.0 with MP (64 cores)
    Analysis
    Raul wrote a Do file that uses the iebaltab function to analyze a dataset that is 4.3GB:

    Code:
    iebaltab median_hs6_unit_price median_hs6_cifdoldecla median_hs6_imponiblegs unit_price_final cifdoldecla imponiblegs, replace grpvar(val_count_patronage_hire) fixedeffect(port_day_ID) ///
        savetex("$DirOutFasse\baltab_val_shipment_item_values_counter_day.tex") ///
        grplabels(0 "Non-patronage" @ 1 "Patronage")  format(%12.0fc) order(1 0) ///
        rowlabels(median_hs6_unit_price "Median HS6 unit price (in USD)" @ median_hs6_cifdoldecla "Median HS6 CIF value (in USD)" ///
            @ median_hs6_imponiblegs "Median HS6 tax base (in PYG)" @ unit_price_final "Unit price (in USD)" ///
            @ cifdoldecla "Declared CIF value (in USD)" @ imponiblegs "Tax base (in PYG)") nonote
    Troubleshooting
    1. When Raul runs the analysis with the entire dataset, the run dies at about 55 minutes. I submitted a job via the scheduler to run the same Do-file using the original dataset. While the job was running, I used top to see cpu and memory usage and I also kept checking the disk usage of /tmp with the du command. The core usage was almost at 100% for all 64 cores, memory was at about 5-6% (of 499 GB), and /tmp had about 4-5 GB usage. At about 1h, I could see each process dying and everything stalled. The max RAM used was 35.67 GB.
    2. Raul took a smaller random sample, 5% of the original dataset. For this case, the job ran until the end and it took about 8.42 hours. I also checked that the job allocated 499 GB of RAM, but only 20.21 GB was used.
    Because the iebaltab function runs well on a smaller dataset, the developer suspects it may be a bug or problem with Stata rather than iebaltab. So, I am reaching out to Stata to see if you can advise on any troubleshooting steps. I am wondering if we may be hitting some sort of limit on Stata computations?

    Thank you,
    Paula and Raul

  • #2
    There was a crashing bug that was fixed in the Stata 18 update on 16oct2024. Item 10 in help whatsnew for this update states
    matrix accum and many estimation commands, such as regress, when specified with more than 23,169 continuous variables or factor-variable levels, sometimes exited with error message "op. sys. refuses to provide memory", but could also crash Stata, even if your computer had the memory resources available to handle computing the specified cross-product matrix. This has been fixed.
    Stata 17's last update was on 20may2024, and does not include this bug fix.

    In the github issue #368, the data appears to have 56,000 fixed effects. Looking at the ado-file for iebaltab, it appears to be using regress and including the fixed effects as regressors which prompted me to mention the above update to Stata 18.

    The number of unique elements in the X'X matrix for this regression, just counting the fixed effects, amounts to roughly comb(56000,2)=1,567,972,000. Inverting this matrix is probably the most time consuming aspect of fitting this model, and I imagine it will take days to complete -- assuming the above Stata 18 update addresses the crashing bug.

    I suspect that reformulating the model to use areg to absorb the fixed effect will significantly reduce the time to fit this regression model, and likely will not tickle the crashing bug in Stata 17.
    Last edited by Jeff Pitblado (StataCorp); 26 Nov 2024, 16:33.

    Comment


    • #3
      Thank you for the very quick response. I posted your suggestion to reformulate iebaltab using areg on the Github issue.

      Comment

      Working...
      X