How to make my code run faster

Fredrick orwa

Join Date: Dec 2018

Posts: 109
#1

How to make my code run faster

24 Jun 2024, 03:48

Dear Members,
Kindly requesting on how to make my below code run faster.
I have code for bootstrating from imputed data.Thanks to Felix Bittmann paper (https://www.preprints.org/manuscript/202401.0813/v1).My problem is that my code is taking too long running for (5days).
Anyone with idea on how I can make it faster will be of great help to me.
Regards,
Fred

****Trying CI for imputed dataset

****calculating PAF and its CI
*use data/ONLYpafdta,clear
* Define a program to calculate PAF
cd "/Users/fodiwuor/Library/CloudStorage/OneDrive-KemriWellcomeTrust/fodiwuor/studies/AASRF_projects"
capture log close
log using "/Users/fodiwuor/Library/CloudStorage/OneDrive-KemriWellcomeTrust/fodiwuor/studies/AASRF_projects/CARRIAGE AND SYTEMATICREVIEW/dofile/simulatePAF.txt",text replace
use data/temp/ONLYpafdta,clear
capture program drop calc_paf
program define calc_paf, rclass
use data/temp/ONLYpafdta,clear
*version 17.0
*syntax varlist [,if] [,in]
*args ipdc hiv year agecat ipdoutc pop
***unser5
bsample, cluster(idcode) idcluster(newid)
*local j=1
**under5
local prop_under5=0
local prop_under5RR=0
forval fred=1/20{
nbreg ipd_count hiv_status year if agecat1==1 & _mi_m==`fred', exp(midyrpopdx) difficult
*local prevRR_`fred'_1=_b[hiv_status]
local prop_under5RR=`prop_under5RR'+_b[hiv_status]

proportion hiv_status if agecat1==1 & !missing(ipd_outtcome) & _mi_m==`fred'
*local prev_`fred'_1=r(table)[1,2]
local prop_under5=`prop_under5'+r(table)[1,2]
}
local prop_under5=round(`prop_under5'/20,0.01)
local prop_under5RR=round(exp(`prop_under5RR'/20),0.01)

**5to14
local prop_5to14=0
local prop_5to14RR=0
forval fred=1/20{
nbreg ipd_count hiv_status year if agecat1==2 & _mi_m==`fred', exp(midyrpopdx) difficult
*local prevRR_`fred'_2=_b[hiv_status]
local prop_5to14RR=`prop_5to14RR'+_b[hiv_status]

proportion hiv_status if agecat1==2 & !missing(ipd_outtcome) & _mi_m==`fred'
*local prev_`fred'_2=r(table)[1,2]
local prop_5to14=`prop_5to14'+r(table)[1,2]
}
local prop_5to14=round(`prop_5to14'/20,0.01)
local prop_5to14RR=round(exp(`prop_5to14RR'/20),0.01)

**15+
local prop_5Above=0
local prop_5AboveRR=0
forval fred=1/20{
nbreg ipd_count hiv_status year if agecat1==3 & _mi_m==`fred', exp(midyrpopdx) difficult
*local prevRR_`fred'_3=_b[hiv_status]
local prop_5AboveRR=`prop_5AboveRR'+_b[hiv_status]

proportion hiv_status if agecat1==3 & !missing(ipd_outtcome) & _mi_m==`fred'
*local prev_`fred'_3=r(table)[1,2]
local prop_5Above=`prop_5Above'+r(table)[1,2]
}
local prop_5Above=round(`prop_5Above'/20,0.01)
local prop_5AboveRR=round(exp(`prop_5AboveRR'/20),0.01)

**proportion
*local prop_under5=round((`prev_1_1'+`prev_2_1'+`prev_3_1 '+`prev_4_1'+`prev_5_1'+`prev_6_1'+`prev_7_1'+`pre v_8_1'+`prev_9_1'+`prev_10_1'+`prev_11_1'+`prev_12 _1' ///
*+`prev_13_1'+`prev_14_1'+`prev_15_1'+`prev_16_1'+ `prev_17_1'+`prev_18_1'+`prev_19_1'+`prev_20_1')/20,0.01)
*di "`prop_under5'"
*ren prop_under5 under5prop

*local prop_5to14=round((`prev_1_2'+`prev_2_2'+`prev_3_2' +`prev_4_2'+`prev_5_2'+`prev_6_2'+`prev_7_2'+`prev _8_2'+`prev_9_2'+`prev_10_2'+`prev_11_2'+`prev_12_ 2' ///
*+`prev_13_2'+`prev_14_2'+`prev_15_2'+`prev_16_2'+ `prev_17_2'+`prev_18_2'+`prev_19_2'+`prev_20_2')/20,0.01)
*di "`prop_5to14'"

*local prop_5Above=round((`prev_1_3'+`prev_2_3'+`prev_3_3 '+`prev_4_3'+`prev_5_3'+`prev_6_3'+`prev_7_3'+`pre v_8_3'+`prev_9_3'+`prev_10_3'+`prev_11_3'+`prev_12 _3' ///
*+`prev_13_3'+`prev_14_3'+`prev_15_3'+`prev_16_3'+ `prev_17_3'+`prev_18_3'+`prev_19_3'+`prev_20_3')/20,0.01)
*di "`prop_5Above'"
***relative risk
*local prop_under5RR=round(exp((`prevRR_1_1'+`prevRR_2_1' +`prevRR_3_1'+`prevRR_4_1'+`prevRR_5_1'+`prevRR_6_ 1'+`prevRR_7_1'+`prevRR_8_1'+`prevRR_9_1'+`prevRR_ 10_1'+`prevRR_11_1'+`prevRR_12_1' ///
*+`prevRR_13_1'+`prevRR_14_1'+`prevRR_15_1'+`prevR R_16_1'+`prevRR_17_1'+`prevRR_18_1'+`prevRR_19_1'+ `prevRR_20_1')/20),0.01)
*di "`prop_under5RR'"
*ren prop_under5RR RR_under5

*local prop_5to14RR=round(exp((`prevRR_1_2'+`prevRR_2_2'+ `prevRR_3_2'+`prevRR_4_2'+`prevRR_5_2'+`prevRR_6_2 '+`prevRR_7_2'+`prevRR_8_2'+`prevRR_9_2'+`prevRR_1 0_2'+`prevRR_11_2'+`prevRR_12_2' ///
*+`prevRR_13_2'+`prevRR_14_2'+`prevRR_15_2'+`prevR R_16_2'+`prevRR_17_2'+`prevRR_18_2'+`prevRR_19_2'+ `prevRR_20_2')/20),0.01)
*di "`prop_5to14RR'"

*local prop_5AboveRR=round(exp((`prevRR_1_3'+`prevRR_2_3' +`prevRR_3_3'+`prevRR_4_3'+`prevRR_5_3'+`prevRR_6_ 3'+`prevRR_7_3'+`prevRR_8_3'+`prevRR_9_3'+`prevRR_ 10_3'+`prevRR_11_3'+`prevRR_12_3' ///
*+`prevRR_13_3'+`prevRR_14_3'+`prevRR_15_3'+`prevR R_16_3'+`prevRR_17_3'+`prevRR_18_3'+`prevRR_19_3'+ `prevRR_20_3')/20),0.01)
*di "`prop_5AboveRR'"

// Calculate PAF
local paf =round((`prop_under5'*(`prop_under5RR'-1))/(`prop_under5RR'),0.01)
return scalar pafunde5 = `paf'

***5to14
local pafx =round((`prop_5to14'*(`prop_5to14RR'-1))/(`prop_5to14RR'),0.01)
return scalar pafunde5to14 =`pafx'

***15 and above
local pafxx =round((`prop_5Above'*(`prop_5AboveRR'-1))/(`prop_5AboveRR'),0.01)
return scalar pafunde15plus=`pafxx'

***proportion
di "`prop_under5'"
di "`prop_5to14'"
di "`prop_5Above'"

***RR
di "`prop_under5RR'"
di "`prop_5to14RR'"
di "`prop_5AboveRR'"
end
**boot strap
*bootstrap r(pafunde5), reps(1000) nodrop: calc_paf
***Amm just so interested in CI
*simulate result=r(pafunde5) ,reps(1000) seed (123) dots:calc_paf
*centile result , centile(2.5 97.5)

local seeds 123 456 789 101112
*parallel setclusters 4,statapath(/Applications/Stata/StataBE.app/Contents/MacOS/StataBE)
parallel initialize 4,statapath(/Applications/Stata/StataBE.app/Contents/MacOS/StataBE)
parallel sim,expr(pafunde5=r(pafunde5) pafunde5to14=r(pafunde5to14) pafunde15plus=r(pafunde15plus)) reps(1000) seed(`seeds') noisily trace saving("Data/res_imputeboot", replace): calc_paf
parallel viewlog 4
centile pafunde5 , centile(2.5 97.5)
centile pafunde5to14, centile(2.5 97.5)
centile pafunde15plus, centile(2.5 97.5
centile result, centile(2.5 97.5)
sum *, det
Tags: None
Daniel Feenberg

Join Date: Oct 2014

Posts: 321
#2

24 Jun 2024, 08:38

Some suggestions that could apply to any -bootstrap- job, in increasing order of difficulty. I think dividing the bootstrap into 20 separate Stata jobs will be the key.
-set rmsg on- so you know where the time is spent.

Are you sure you need the -difficult- option to -nbreg-?

statamp will make most regression steps much faster with no additional programming if you have more than one core.

-bootstrap- jobs can be divided up to run separately. See https://www.nber.org/stata/efficient/bootstrap.html but run each piece of the bootstrap in a separate Stata job. Works well if you have multiple cores.

https://github.com/gvegayon/parallel

https://www.stata.com/meeting/uk18/s...k18_Ditzen.pdf

https://github.com/gvegayon/parallel

See https://www.nber.org/stata/efficient/ for other suggestions for dealing with big or long-running Stata jobs.

Last edited by Daniel Feenberg; 24 Jun 2024, 09:01.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#3

24 Jun 2024, 10:45

Beyond Daniel's useful suggestions, here are some other ideas: I'd try Stata's -profiler- command to see which aspects are taking the most time. Similar or even better information can be obtained by using several instances of the -timer- command within your code. Both of these can be useful in addition to or instead of -set msg on-.

A suggestion regarding posting: You're asking for help with a large chunk of code, over 100 lines. In doing that, taking special care to make it easy to read your code would increase the chances that someone would want to try to help you. Using code delimiters is one thing that would help, but I'd also suggest using conventional indentation practices, avoiding lines that split on the screen, and removing lines that you have commented out with "*".
1 like
Comment

Announcement

How to make my code run faster

Comment

Comment