I am a Research Computing Facilitator at FASRC. Raul Duarte reached out to our support because he was running a Stata code with the function iebaltab from the ietoolkit library on our cluster and the job was dying midway through computation. We troubleshot extensively without much progress. We have reached out to the developers of iebaltab (see Github issue https://github.com/worldbank/ietoolkit/issues/368) thinking it was an issue related to the function itself, but we found that the function works with a smaller dataset. The iteoolkit developers suggested reaching out to Stata instead.
Unfortunately, because Raul’s data cannot be shared (because of a Data Use Agreement [DUA] signed), we cannot share the data, but we will try to explain as much as possible. This a summary of our findings so far (some copy and paste from Github issue so this post has more context):
Computational environment
Raul wrote a Do file that uses the iebaltab function to analyze a dataset that is 4.3GB:
Troubleshooting
Thank you,
Paula and Raul
Unfortunately, because Raul’s data cannot be shared (because of a Data Use Agreement [DUA] signed), we cannot share the data, but we will try to explain as much as possible. This a summary of our findings so far (some copy and paste from Github issue so this post has more context):
Computational environment
- OS: Rocky Linux 8.9
- Hardware (for more details, see https://docs.rc.fas.harvard.edu/kb/f...and_Partitions):
- fasse_bigmem partition: Intel Ice Lake chipset, 499 GB of RAM, /tmp space is 172 GB
- fasse_ultramem partition: Intel Ice Lake chipset, 2000 GB of RAM, /tmp space is 396 GB
- Stata: version 17.0 with MP (64 cores)
Raul wrote a Do file that uses the iebaltab function to analyze a dataset that is 4.3GB:
Code:
iebaltab median_hs6_unit_price median_hs6_cifdoldecla median_hs6_imponiblegs unit_price_final cifdoldecla imponiblegs, replace grpvar(val_count_patronage_hire) fixedeffect(port_day_ID) /// savetex("$DirOutFasse\baltab_val_shipment_item_values_counter_day.tex") /// grplabels(0 "Non-patronage" @ 1 "Patronage") format(%12.0fc) order(1 0) /// rowlabels(median_hs6_unit_price "Median HS6 unit price (in USD)" @ median_hs6_cifdoldecla "Median HS6 CIF value (in USD)" /// @ median_hs6_imponiblegs "Median HS6 tax base (in PYG)" @ unit_price_final "Unit price (in USD)" /// @ cifdoldecla "Declared CIF value (in USD)" @ imponiblegs "Tax base (in PYG)") nonote
- When Raul runs the analysis with the entire dataset, the run dies at about 55 minutes. I submitted a job via the scheduler to run the same Do-file using the original dataset. While the job was running, I used top to see cpu and memory usage and I also kept checking the disk usage of /tmp with the du command. The core usage was almost at 100% for all 64 cores, memory was at about 5-6% (of 499 GB), and /tmp had about 4-5 GB usage. At about 1h, I could see each process dying and everything stalled. The max RAM used was 35.67 GB.
- Raul took a smaller random sample, 5% of the original dataset. For this case, the job ran until the end and it took about 8.42 hours. I also checked that the job allocated 499 GB of RAM, but only 20.21 GB was used.
Thank you,
Paula and Raul
Comment