Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to code a cumulative product using a for-loop?

    I've been racking my brain for hours trying to get this, but just can't seem to.

    Example data set:

    Code:
    clear
    input byte id float date byte(month) float(return)
    1  324  1   .9642857
    1  325  2   .9259259
    1  326  3   1.0368
    1  327  4   .9607843
    1  328  5   .9285714
    1  329  6   1.051429
    1  330  7   1.021277
    1  331  8   1.083333
    1  332  9   .9776923
    1  333  10  1.02
    1  334  11  .9705882
    1  335  12  .9664646
    1  336  1   1.06383
    1  337  2   1.08
    1  338  3   .9237037
    1  339  4   1.030612
    1  340  5   1.019802
    1  341  6   .9879612
    end
    To clarify the variables: ID (obvious), date is mofd(date variable), month is calendar month number (e.g., Jan = 1, Feb = 2 etc)


    In my original data, it is xtset by id and date: (since I have multiple IDs)

    Code:
     xtset id date
    I have coded my calculation 'manually' so that it is working, and wanted to know how to do it using a for-loop

    The manual calculation:
    Code:
    gen ret_6month = return*L.return*L2.return*L3.return*L4.return*L5.return-1 if month_num == 6
    So here, ret_6month is the cumulative product of the current return multiplied by the 5 before it, minus 1, when we are at June (month_num == 6) each year.

    I know in this case it is not so tedious to write manually do it, but I also wanted to calculate the cumulative product for 12, 36, and 72 months, also at month_num==6

    How would I go about doing this? I just can't seem to figure it out, every attempt gives me different numbers, yet I feel like it is simple.

    Any help is appreciated.

    Thanks,

    Dave

    Edit: also, the resulting values for the above data set (should be, if I haven't made any typos):
    For month 329: ret_6month = -.1316437
    For month 341: ret_6month = .1019949
    Last edited by David Lade; 26 Nov 2019, 17:41.

  • #2
    Unfortuately there is no built-in running-product function in Stata. But for your situation, where the things being multiplied are always positive, we can rely on logarithms and add them.

    Code:
    assert return > 0
    
    gen log_return = log(return)
    rangestat (sum) log_return (count) log_return, by(id) interval(date -5 0)
    gen ret_6month = exp(log_return_sum) - 1 if log_return_count == 6
    Evidently, for longer time periods, change the numbers in the -interval()- option.
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

    Comment


    • #3
      Here's an alternative to Clyde's code built on basic principles. It's a second example of accomplishing the task without a for loop, a little more similar to the code you were attempting.
      Code:
      tsset id date
      generate cprod = .
      bysort id (date): replace  cprod  = cond(_n==1,return, return*L.cprod) 
      bysort id (date): generate ret_6  = cond(_n==6, cprod, cprod/L6.cprod)  - 1
      bysort id (date): generate ret_12 = cond(_n==12,cprod, cprod/L12.cprod) - 1
      list, clean noobs
      Code:
      . list, clean noobs
      
          id   date   month     return      cprod       ret_6      ret_12  
           1    324       1   .9642857   .9642857           .           .  
           1    325       2   .9259259   .8928571           .           .  
           1    326       3     1.0368   .9257143           .           .  
           1    327       4   .9607843   .8894117           .           .  
           1    328       5   .9285714   .8258823           .           .  
           1    329       6   1.051429   .8683566   -.1316434           .  
           1    330       7   1.021277   .8868327   -.0803217           .  
           1    331       8   1.083333   .9607351    .0760234           .  
           1    332       9   .9776923   .9393033    .0146795           .  
           1    333      10       1.02   .9580894    .0772169           .  
           1    334      11   .9705882   .9299102    .1259597           .  
           1    335      12   .9664646   .8987253    .0349726   -.1012747  
           1    336       1    1.06383    .956091    .0780963   -.0084982  
           1    337       2       1.08   1.032578    .0747795    .1564878  
           1    338       3   .9237037   .9537964    .0154296    .0303356  
           1    339       4   1.030612    .982994    .0259941    .1052182  
           1    340       5   1.019802   1.002459    .0780173     .213804  
           1    341       6   .9879612   .9903908     .101995    .1405346

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Unfortuately there is no built-in running-product function in Stata. But for your situation, where the things being multiplied are always positive, we can rely on logarithms and add them.

        Code:
        assert return > 0
        
        gen log_return = log(return)
        rangestat (sum) log_return (count) log_return, by(id) interval(date -5 0)
        gen ret_6month = exp(log_return_sum) - 1 if log_return_count == 6
        Evidently, for longer time periods, change the numbers in the -interval()- option.
        -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.


        Ah I see, that certainly is a work-around.

        I am curious though, surely we can code this in a for-loop? For instance, I was thinking along the following lines:

        Code:
         forvalues i = 1/6 { gen Lret`i' = Ret[_n-`i'] if month==6 }
        (I know this is not exactly what I am after, but I was thinking with some chopping and changing we could get there. Do you think it is not possible?

        Comment


        • #5
          Yes, you can do it with a forvalues loop.

          Code:
          xtset id date
          
          gen cprod = return
          forvalues i = 1/5 {
              replace cprod = cprod*L`i'.return
          }
          gen wanted = cprod - 1
          But when you get out to 36 or 72 months, if your data set is large, it's going to be noticeably slower than the approach in either #2 or #3.

          Actually, on thinking about it, William Lisowski's approach is probably the fastest option.

          Comment


          • #6
            Thanks Clyde, this works too, although you are right - it is noticeably slower! Regardless I think I am getting the hang of loops in stata now, so that was very helpful.

            Comment

            Working...
            X