Hi,
I am running a foreach loop on a panel data set. The dataset includes 1900 ID and 4.7 million observations. However the loop only updates the last two observations, still the time to complete the loop is too long. The commands inside the loop itself is not complicated but each panel ID takes around 20 sec to complete, so in total the foreach loop takes approximately 20 sec*1900 IDs= 11 hours. I need to do this foreach loop once a day (i am updating my dataset daily with new observations) for a period of time.
Do you have any suggestions how to improve the speed of the foreach loop?
I have tried to use the parallel module,
(net install parallel, from(https://raw.github.com/gvegayon/parallel/stable/) replace
mata mata mlib index)
to improve calculation speed, but I can not get it to work. I am guessing it has something to do with the local used to initialize the foreach loop but I am not sure.
Foreach loop (that is used to recalculate the last two obs of each ID):
My test using parallel:
The above parallel loop gives me the following error:
I assume that the error is because of I am using a local to loop over IDs. However I am not sure how to solve it.
Any suggestions to improve the speed of the foreach loop, or to get parallel to work (if you think that would improve speed) would be greatly appreciated. Sorry for the long post. Thank you in advance.
Best regards,
Jesper Eriksson
I am running a foreach loop on a panel data set. The dataset includes 1900 ID and 4.7 million observations. However the loop only updates the last two observations, still the time to complete the loop is too long. The commands inside the loop itself is not complicated but each panel ID takes around 20 sec to complete, so in total the foreach loop takes approximately 20 sec*1900 IDs= 11 hours. I need to do this foreach loop once a day (i am updating my dataset daily with new observations) for a period of time.
Do you have any suggestions how to improve the speed of the foreach loop?
I have tried to use the parallel module,
(net install parallel, from(https://raw.github.com/gvegayon/parallel/stable/) replace
mata mata mlib index)
to improve calculation speed, but I can not get it to work. I am guessing it has something to do with the local used to initialize the foreach loop but I am not sure.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int ID float(obs_ datum) double(Openprice_ Highprice_ Lowprice_ Closeprice_ psar extremepoint AccFactor psaronreversal) byte trend float(totobs_ last) 2 4164 22755 185.7 186.75 182.5 182.5 187.82953600000002 180.3 .02 . 0 4167 . 2 4165 22756 182.6 185.1 181.1 183.4 187.67894528000002 180.3 .02 . 0 4167 . 2 4166 22757 183.4 183.4 180.35 180.95 187.53136637440002 180.3 .02 . 0 4167 1 2 4167 22760 177.5 178 165.6 171.85 . . . . . 4167 . 3 1 15445 73.8918 73.8918 71.9575 73.1181 . . .02 . . 5036 . 3 2 15446 73.8918 76.5999 73.505 76.213 . . .02 . . 5036 . 3 3 15447 76.5999 76.9868 73.8918 75.0524 71.9575 76.5999 .02 . 1 5036 . 3 4 15448 75.0524 75.0524 70.7969 71.9575 76.5999 70.7969 .02 76.5999 0 5036 . 3 5 15449 71.5706 71.5706 67.3151 69.6363 76.9868 67.3151 .04 . 0 5036 . 3 6 15452 68.4757 69.6363 64.9939 65.7676 76.599932 64.9939 .06 . 0 5036 . 3 7 15453 66.9282 67.3151 64.607 64.607 75.90357008 64.607 .08 . 0 5036 . 3 8 15454 66.9282 69.2494 66.5413 69.2494 74.99984447359999 64.607 .08 . 0 5036 . end format %td datum
Foreach loop (that is used to recalculate the last two obs of each ID):
Code:
levelsof ID, local (testlist) clean foreach x in `testlist' { di "`x'" timer clear 100 timer on 100 su obs_ if ID==`x' & last==1 local startobs_`x'= r(max) su totobs_ if ID ==`x' local totobs_`x' = r(max) forvalues obs = `startobs_`x''/`totobs_`x'' { *----- After close: * Update trend replace trend=trend[_n-1] if psar!=. & obs_==`obs' & (Highprice_!=. & Lowprice_!=. & Closeprice_!=. & Openprice_!=.) replace trend =0 if psar>Lowprice_ & psar!=. & obs_==`obs' & trend==1 & trend[_n-1]==1 & Lowprice_!=. replace trend =1 if psar<Highprice_ & psar!=. & obs_==`obs' & trend==0 & trend[_n-1]==0 & Highprice_!=. *--- Reversal? Yes, to Bear * Reset AF replace AccFactor = 0.02 if trend==0 & trend[_n-1]==1 & obs_==`obs' * Extremepoint is todays low replace extremepoint = Lowprice_ if trend==0 & trend[_n-1]==1 & obs_==`obs' * Todays psar is replaced to previous trends highest high. replace psaronreversal = psar if obs_==`obs' & trend==0 & trend[_n-1]==1 replace psar = extremepoint[_n-1] if trend==0 & trend[_n-1]==1 & obs_==`obs' * Next days psar is highest high from previous trend-spell replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]== 1 & obs_==(`obs'+1) replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==1 & obs_==(`obs'+1) *--- Reversal? Yes, to Bull replace AccFactor = 0.02 if trend==1 & trend[_n-1]==0 & obs_==`obs' * Extremepoint is todays high replace extremepoint = Highprice_ if trend==1 & trend[_n-1]==0 & obs_==`obs' * Todays psar is replaces with previous trends highest high. replace psaronreversal = psar if obs_==`obs' & trend==1 & trend[_n-1]==0 replace psar = extremepoint[_n-1] if trend==1 & trend[_n-1]==0 & obs_==`obs' * Next days psar is lowest low from previous trend-spell replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==0 & obs_==`obs'+1 replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] ) if trend[_n-1]==1 & trend[_n-2]==0 & obs_==`obs'+1 *--- No reversal. Bull continues. * Uppdatera AccFactor. replace AccFactor = AccFactor[_n-1] if trend==1 & trend[_n-1]==1 & obs_==`obs' replace AccFactor = (AccFactor[_n-1]+0.02) if ((Highprice_>extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend== 1 & trend[_n-1]==1) & (obs_==`obs')) * Uppdatera din extremepoint replace extremepoint = max(Highprice_, extremepoint[_n-1]) if trend==1 & trend[_n-1]==1 & obs_==`obs' * Calculate tomorrows psar replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==1 & obs_==`obs'+1 replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] ) if trend[_n-1]==1 & trend[_n-2]==1 & obs_==`obs'+1 *--- No reversal. Bear continues. * Update AccFactor. replace AccFactor = AccFactor[_n-1] if trend==0 & trend[_n-1]==0 & obs_==`obs' replace AccFactor = (AccFactor[_n-1]+0.02) if ((Lowprice_<=extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend== 0 & trend[_n-1]==0) & (obs_==`obs')) * Update extremepoint replace extremepoint = min(Lowprice_, extremepoint[_n-1]) if trend==0 & trend[_n-1]==0 & obs_==`obs' * Calculate tomorrows psar replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]== 0 & obs_==(`obs'+1) replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==0 & obs_==(`obs'+1) } timer off 100 timer list 100 }
Code:
levelsof ID, local (testlist) clean parallel initialize 4 parallel: foreach x in `testlist' { di "`x'" timer clear 100 timer on 100 su obs_ if ID==`x' & last==1 local startobs_`x'= r(max) su totobs_ if ID ==`x' local totobs_`x' = r(max) forvalues obs = `startobs_`x''/`totobs_`x'' { *----- After close: * Update trend replace trend=trend[_n-1] if psar!=. & obs_==`obs' & (Highprice_!=. & Lowprice_!=. & Closeprice_!=. & Openprice_!=.) replace trend =0 if psar>Lowprice_ & psar!=. & obs_==`obs' & trend==1 & trend[_n-1]==1 & Lowprice_!=. replace trend =1 if psar<Highprice_ & psar!=. & obs_==`obs' & trend==0 & trend[_n-1]==0 & Highprice_!=. *--- Reversal? Yes, to Bear * Reset AF replace AccFactor = 0.02 if trend==0 & trend[_n-1]==1 & obs_==`obs' * Extremepoint is todays low replace extremepoint = Lowprice_ if trend==0 & trend[_n-1]==1 & obs_==`obs' * Todays psar is replaced to previous trends highest high. replace psaronreversal = psar if obs_==`obs' & trend==0 & trend[_n-1]==1 replace psar = extremepoint[_n-1] if trend==0 & trend[_n-1]==1 & obs_==`obs' * Next days psar is highest high from previous trend-spell replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]== 1 & obs_==(`obs'+1) replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==1 & obs_==(`obs'+1) *--- Reversal? Yes, to Bull replace AccFactor = 0.02 if trend==1 & trend[_n-1]==0 & obs_==`obs' * Extremepoint is todays high replace extremepoint = Highprice_ if trend==1 & trend[_n-1]==0 & obs_==`obs' * Todays psar is replaces with previous trends highest high. replace psaronreversal = psar if obs_==`obs' & trend==1 & trend[_n-1]==0 replace psar = extremepoint[_n-1] if trend==1 & trend[_n-1]==0 & obs_==`obs' * Next days psar is lowest low from previous trend-spell replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==0 & obs_==`obs'+1 replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] ) if trend[_n-1]==1 & trend[_n-2]==0 & obs_==`obs'+1 *--- No reversal. Bull continues. * Uppdatera AccFactor. replace AccFactor = AccFactor[_n-1] if trend==1 & trend[_n-1]==1 & obs_==`obs' replace AccFactor = (AccFactor[_n-1]+0.02) if ((Highprice_>extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend== 1 & trend[_n-1]==1) & (obs_==`obs')) * Uppdatera din extremepoint replace extremepoint = max(Highprice_, extremepoint[_n-1]) if trend==1 & trend[_n-1]==1 & obs_==`obs' * Calculate tomorrows psar replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==1 & obs_==`obs'+1 replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] ) if trend[_n-1]==1 & trend[_n-2]==1 & obs_==`obs'+1 *--- No reversal. Bear continues. * Update AccFactor. replace AccFactor = AccFactor[_n-1] if trend==0 & trend[_n-1]==0 & obs_==`obs' replace AccFactor = (AccFactor[_n-1]+0.02) if ((Lowprice_<=extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend== 0 & trend[_n-1]==0) & (obs_==`obs')) * Update extremepoint replace extremepoint = min(Lowprice_, extremepoint[_n-1]) if trend==0 & trend[_n-1]==0 & obs_==`obs' * Calculate tomorrows psar replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]== 0 & obs_==(`obs'+1) replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==0 & obs_==(`obs'+1) } timer off 100 timer list 100 }
Code:
parallel: foreach x in `testlist' { -------------------------------------------------------------------------------- Parallel Computing with Stata Child processes: 4 pll_id : dzg5bnuy61 Running at : /Users/jesper.eriksson/OneDrive - KI.SE/Mac/Documents/-=Forsk=-/Projekt/Aktier/Ep_pivot/Data/Full_ > dataset Randtype : datetime Waiting for the child processes to finish... child process 0001 has exited without error... child process 0002 has exited without error... child process 0003 has exited without error... child process 0004 has exited without error... -------------------------------------------------------------------------------- Enter -parallel printlog #- to checkout logfiles. -------------------------------------------------------------------------------- . di "`x'" . timer clear 100 . timer on 100 . su obs_ if ID==`x' & last==1 invalid syntax r(198); end of do-file r(198);
Any suggestions to improve the speed of the foreach loop, or to get parallel to work (if you think that would improve speed) would be greatly appreciated. Sorry for the long post. Thank you in advance.
Best regards,
Jesper Eriksson
Comment