Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems getting "parallel" to work with foreach loop, alternatively, suggestions to improve speed of foreach loop

    Hi,

    I am running a foreach loop on a panel data set. The dataset includes 1900 ID and 4.7 million observations. However the loop only updates the last two observations, still the time to complete the loop is too long. The commands inside the loop itself is not complicated but each panel ID takes around 20 sec to complete, so in total the foreach loop takes approximately 20 sec*1900 IDs= 11 hours. I need to do this foreach loop once a day (i am updating my dataset daily with new observations) for a period of time.

    Do you have any suggestions how to improve the speed of the foreach loop?

    I have tried to use the parallel module,

    (net install parallel, from(https://raw.github.com/gvegayon/parallel/stable/) replace
    mata mata mlib index)

    to improve calculation speed, but I can not get it to work. I am guessing it has something to do with the local used to initialize the foreach loop but I am not sure.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int ID float(obs_ datum) double(Openprice_ Highprice_ Lowprice_ Closeprice_ psar extremepoint AccFactor psaronreversal) byte trend float(totobs_ last)
    2 4164 22755   185.7  186.75   182.5   182.5 187.82953600000002   180.3 .02       . 0 4167 .
    2 4165 22756   182.6   185.1   181.1   183.4 187.67894528000002   180.3 .02       . 0 4167 .
    2 4166 22757   183.4   183.4  180.35  180.95 187.53136637440002   180.3 .02       . 0 4167 1
    2 4167 22760   177.5     178   165.6  171.85                  .       .   .       . . 4167 .
    3    1 15445 73.8918 73.8918 71.9575 73.1181                  .       . .02       . . 5036 .
    3    2 15446 73.8918 76.5999  73.505  76.213                  .       . .02       . . 5036 .
    3    3 15447 76.5999 76.9868 73.8918 75.0524            71.9575 76.5999 .02       . 1 5036 .
    3    4 15448 75.0524 75.0524 70.7969 71.9575            76.5999 70.7969 .02 76.5999 0 5036 .
    3    5 15449 71.5706 71.5706 67.3151 69.6363            76.9868 67.3151 .04       . 0 5036 .
    3    6 15452 68.4757 69.6363 64.9939 65.7676          76.599932 64.9939 .06       . 0 5036 .
    3    7 15453 66.9282 67.3151  64.607  64.607        75.90357008  64.607 .08       . 0 5036 .
    3    8 15454 66.9282 69.2494 66.5413 69.2494  74.99984447359999  64.607 .08       . 0 5036 .
    end
    format %td datum

    Foreach loop (that is used to recalculate the last two obs of each ID):

    Code:
    levelsof ID, local (testlist) clean
    foreach x in `testlist'  {
        di "`x'"
        timer clear 100
        timer on 100
        su obs_ if ID==`x' & last==1
        local startobs_`x'= r(max) 
        su totobs_ if ID ==`x'
        local totobs_`x' = r(max) 
    
        forvalues obs = `startobs_`x''/`totobs_`x'' {
    
        *----- After close:
        * Update trend 
        replace trend=trend[_n-1] if psar!=. & obs_==`obs' & (Highprice_!=. & Lowprice_!=. & Closeprice_!=. & Openprice_!=.)
            
        replace trend =0 if psar>Lowprice_ & psar!=. & obs_==`obs' & trend==1 & trend[_n-1]==1 & Lowprice_!=.
        replace trend =1 if psar<Highprice_ & psar!=. & obs_==`obs' & trend==0 & trend[_n-1]==0 & Highprice_!=.
            
            
            *--- Reversal? Yes, to Bear
            * Reset AF
        replace AccFactor = 0.02 if trend==0 & trend[_n-1]==1 & obs_==`obs'
            * Extremepoint is todays low
        replace extremepoint = Lowprice_ if trend==0 & trend[_n-1]==1 & obs_==`obs'
            * Todays psar is replaced to previous trends highest high.
        replace psaronreversal = psar if obs_==`obs' & trend==0 & trend[_n-1]==1 
        replace psar = extremepoint[_n-1] if trend==0 & trend[_n-1]==1 & obs_==`obs'
            * Next days psar is highest high from previous trend-spell
        replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]==        1 & obs_==(`obs'+1)
        replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==1 & obs_==(`obs'+1)
            
    
    
            *--- Reversal? Yes, to Bull
        replace AccFactor = 0.02 if trend==1 & trend[_n-1]==0 & obs_==`obs'
            * Extremepoint is todays high
        replace extremepoint = Highprice_ if trend==1 & trend[_n-1]==0 & obs_==`obs'
            * Todays psar is replaces with previous trends highest high. 
        replace psaronreversal = psar if obs_==`obs' & trend==1 & trend[_n-1]==0 
        replace psar = extremepoint[_n-1] if trend==1 & trend[_n-1]==0 & obs_==`obs'
            * Next days psar is lowest low from previous trend-spell
        replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==0         & obs_==`obs'+1
        replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] )    if trend[_n-1]==1 & trend[_n-2]==0 & obs_==`obs'+1
    
            
            *--- No reversal. Bull continues.
            * Uppdatera AccFactor.
        replace AccFactor = AccFactor[_n-1]    if trend==1 & trend[_n-1]==1 & obs_==`obs'
        replace AccFactor = (AccFactor[_n-1]+0.02) if ((Highprice_>extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend==        1 & trend[_n-1]==1) & (obs_==`obs'))
            * Uppdatera din extremepoint
        replace extremepoint = max(Highprice_, extremepoint[_n-1]) if trend==1 & trend[_n-1]==1 & obs_==`obs'
    
            * Calculate tomorrows psar
        replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==1         & obs_==`obs'+1
        replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] )    if trend[_n-1]==1 & trend[_n-2]==1 & obs_==`obs'+1
    
            *--- No reversal. Bear continues.
            * Update AccFactor.
        replace AccFactor = AccFactor[_n-1]    if trend==0 & trend[_n-1]==0 & obs_==`obs'
        replace AccFactor = (AccFactor[_n-1]+0.02) if ((Lowprice_<=extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend==        0 & trend[_n-1]==0) & (obs_==`obs'))
            * Update extremepoint
        replace extremepoint = min(Lowprice_, extremepoint[_n-1]) if trend==0 & trend[_n-1]==0 & obs_==`obs'
    
            * Calculate tomorrows psar
        replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]==        0 & obs_==(`obs'+1)
        replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==0 & obs_==(`obs'+1)
    
        }
    
    timer off 100
    timer list 100
    }
    My test using parallel:

    Code:
    levelsof ID, local (testlist) clean
    parallel initialize 4
    parallel: foreach x in `testlist'  {
        di "`x'"
        timer clear 100
        timer on 100
        su obs_ if ID==`x' & last==1
        local startobs_`x'= r(max) 
        su totobs_ if ID ==`x'
        local totobs_`x' = r(max) 
    
        forvalues obs = `startobs_`x''/`totobs_`x'' {
    
        *----- After close:
        * Update trend 
        replace trend=trend[_n-1] if psar!=. & obs_==`obs' & (Highprice_!=. & Lowprice_!=. & Closeprice_!=. & Openprice_!=.)
            
        replace trend =0 if psar>Lowprice_ & psar!=. & obs_==`obs' & trend==1 & trend[_n-1]==1 & Lowprice_!=.
        replace trend =1 if psar<Highprice_ & psar!=. & obs_==`obs' & trend==0 & trend[_n-1]==0 & Highprice_!=.
            
            
            *--- Reversal? Yes, to Bear
            * Reset AF
        replace AccFactor = 0.02 if trend==0 & trend[_n-1]==1 & obs_==`obs'
            * Extremepoint is todays low
        replace extremepoint = Lowprice_ if trend==0 & trend[_n-1]==1 & obs_==`obs'
            * Todays psar is replaced to previous trends highest high.
        replace psaronreversal = psar if obs_==`obs' & trend==0 & trend[_n-1]==1 
        replace psar = extremepoint[_n-1] if trend==0 & trend[_n-1]==1 & obs_==`obs'
            * Next days psar is highest high from previous trend-spell
        replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]==        1 & obs_==(`obs'+1)
        replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==1 & obs_==(`obs'+1)
            
    
    
            *--- Reversal? Yes, to Bull
        replace AccFactor = 0.02 if trend==1 & trend[_n-1]==0 & obs_==`obs'
            * Extremepoint is todays high
        replace extremepoint = Highprice_ if trend==1 & trend[_n-1]==0 & obs_==`obs'
            * Todays psar is replaces with previous trends highest high. 
        replace psaronreversal = psar if obs_==`obs' & trend==1 & trend[_n-1]==0 
        replace psar = extremepoint[_n-1] if trend==1 & trend[_n-1]==0 & obs_==`obs'
            * Next days psar is lowest low from previous trend-spell
        replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==0         & obs_==`obs'+1
        replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] )    if trend[_n-1]==1 & trend[_n-2]==0 & obs_==`obs'+1
    
            
            *--- No reversal. Bull continues.
            * Uppdatera AccFactor.
        replace AccFactor = AccFactor[_n-1]    if trend==1 & trend[_n-1]==1 & obs_==`obs'
        replace AccFactor = (AccFactor[_n-1]+0.02) if ((Highprice_>extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend==        1 & trend[_n-1]==1) & (obs_==`obs'))
            * Uppdatera din extremepoint
        replace extremepoint = max(Highprice_, extremepoint[_n-1]) if trend==1 & trend[_n-1]==1 & obs_==`obs'
    
            * Calculate tomorrows psar
        replace psar = psar[_n-1] + (AccFactor[_n-1]*(extremepoint[_n-1] - psar[_n-1])) if trend[_n-1]==1 & trend[_n-2]==1         & obs_==`obs'+1
        replace psar = min(psar, Lowprice_[_n-1], Lowprice_[_n-2] )    if trend[_n-1]==1 & trend[_n-2]==1 & obs_==`obs'+1
    
            *--- No reversal. Bear continues.
            * Update AccFactor.
        replace AccFactor = AccFactor[_n-1]    if trend==0 & trend[_n-1]==0 & obs_==`obs'
        replace AccFactor = (AccFactor[_n-1]+0.02) if ((Lowprice_<=extremepoint[_n-1]) & (AccFactor[_n-1] <0.19) & (trend==        0 & trend[_n-1]==0) & (obs_==`obs'))
            * Update extremepoint
        replace extremepoint = min(Lowprice_, extremepoint[_n-1]) if trend==0 & trend[_n-1]==0 & obs_==`obs'
    
            * Calculate tomorrows psar
        replace psar = (psar[_n-1] - (AccFactor[_n-1]*(psar[_n-1] - extremepoint[_n-1]))) if trend[_n-1]==0 & trend[_n-2]==        0 & obs_==(`obs'+1)
        replace psar = max(psar, Highprice_[_n-1], Highprice_[_n-2] ) if trend[_n-1]==0 & trend[_n-2]==0 & obs_==(`obs'+1)
    
        }
    
    timer off 100
    timer list 100
    }
    The above parallel loop gives me the following error:

    Code:
    parallel: foreach x in `testlist'  {
    --------------------------------------------------------------------------------
    Parallel Computing with Stata
    Child processes: 4
    pll_id         : dzg5bnuy61
    Running at     : /Users/jesper.eriksson/OneDrive - KI.SE/Mac/Documents/-=Forsk=-/Projekt/Aktier/Ep_pivot/Data/Full_
    > dataset
    Randtype       : datetime
    
    Waiting for the child processes to finish...
    child process 0001 has exited without error...
    child process 0002 has exited without error...
    child process 0003 has exited without error...
    child process 0004 has exited without error...
    --------------------------------------------------------------------------------
    Enter -parallel printlog #- to checkout logfiles.
    --------------------------------------------------------------------------------
    
    .         di "`x'"
    
    
    .         timer clear 100
    
    .         timer on 100
    
    .         su obs_ if ID==`x' & last==1
    invalid syntax
    r(198);
    
    end of do-file
    
    r(198);
    I assume that the error is because of I am using a local to loop over IDs. However I am not sure how to solve it.

    Any suggestions to improve the speed of the foreach loop, or to get parallel to work (if you think that would improve speed) would be greatly appreciated. Sorry for the long post. Thank you in advance.

    Best regards,

    Jesper Eriksson



  • #2
    You are not helping yourself much with copy/pasting so much output/code. One has to have a lot of patience to go through your code and see how you can improve.

    But from what you are saying it seems to me that you have not written your loop well. If you want only the last two ids to be updated, why do you loop through all the ids?

    Write the loop so that it goes only through the last two ids.

    In general to get useful help here, try to come up with a simple version of your problem, that one can grasp for 30 seconds.

    Comment


    • #3
      Thanks for your help, I'll pay attention to the data format part in the future.
      This is the first time I try to communicate with people programmatically, but I don't know much about it.
      Thank you very much for your tolerance and detailed explanation!

      Comment


      • #4
        The way to do what you seem to want is
        Code:
        loc amacro test
        
        foreach x of loc amacro {
        
        
        
        di `x'
        }

        Comment

        Working...
        X