Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate the summation and average of previous values by group

    Hi everyone,

    I have a data looks as below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte consumer str9 date int purchaseamount byte(past_purchase_avg past_purchcase_agg)
    1 "2020/8/24"  300 . .
    1 "2020/8/25"  400 . .
    1 "2020/8/26"  600 . .
    1 "2020/8/27"  700 . .
    1 "2020/8/28"  900 . .
    1 "2020/8/29" 1000 . .
    1 "2020/8/30"  200 . .
    2 "2020/8/28" 2000 . .
    2 "2020/8/29"   21 . .
    2 "2020/8/30"  300 . .
    end
    I need to calculate two variables based on the data that I have: past_purchase_avg is the average purchase amount for each consumer in all the previous periods and the aggregate purchase amount is the summation of all the previous purchases amounts for each consumer. I got stuck here and I wonder if anyone knows how to realize this in stata.

    Thanks a lot!

  • #2
    Created two new variables:

    Code:
    byso consumer: egen meanpurchase=mean(purchaseamount)
    
    byso consumer: egen totpurchase=total(purchaseamount)
    Roman

    Comment


    • #3
      Meng:
      as an aside to Roman's excellent advice, as far as your second question is concerned, if you are interested in the running sum that ends with the total purchases, you may want to try:
      Code:
      . bysort consumer: gen total_wanted=sum( purchaseamount)
      
      . list
      
           +------------------------------------------------------------------+
           | consumer        date   purcha~t   past_~vg   past_~gg   total_~d |
           |------------------------------------------------------------------|
        1. |        1   2020/8/24        300          .          .        300 |
        2. |        1   2020/8/25        400          .          .        700 |
        3. |        1   2020/8/26        600          .          .       1300 |
        4. |        1   2020/8/27        700          .          .       2000 |
        5. |        1   2020/8/28        900          .          .       2900 |
           |------------------------------------------------------------------|
        6. |        1   2020/8/29       1000          .          .       3900 |
        7. |        1   2020/8/30        200          .          .       4100 |
        8. |        2   2020/8/28       2000          .          .       2000 |
        9. |        2   2020/8/29         21          .          .       2021 |
       10. |        2   2020/8/30        300          .          .       2321 |
           +------------------------------------------------------------------+
      
      .
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        My reading of the problem is closer to that of Carlo Lazzaro . The code in #2 from Roman Mostazir gives each consumer mean over all times and does not address the need to focus on previous purchases.

        As Carlo signalled, the Stata function sum() gives running or cumulative sums and can be used here. The running mean is just the sum divided by the number of non-missing values previously. If there are no missing values that reduces to _n - 1 in the context below.

        Alternatively, rangestat from SSC gets you there directly.

        You are going to need a proper numeric Stata daily date variable sooner or later.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte consumer str9 date int purchaseamount 
        1 "2020/8/24"  300 
        1 "2020/8/25"  400 
        1 "2020/8/26"  600 
        1 "2020/8/27"  700 
        1 "2020/8/28"  900 
        1 "2020/8/29" 1000 
        1 "2020/8/30"  200 
        2 "2020/8/28" 2000 
        2 "2020/8/29"   21 
        2 "2020/8/30"  300 
        end
        
        gen ddate = daily(date, "YMD")
        format ddate %td 
        
        bysort consumer (ddate) : gen sum = sum(purchase[_n-1])
        bysort consumer (ddate) : gen mean = sum(purchase[_n-1])/sum(purchase[_n-1] < .)
        
        rangestat (sum) SUM=purchase (mean) MEAN=purchase, int(ddate . -1) by(consumer)
        
        list, sepby(consumer)
        
             +----------------------------------------------------------------------------------+
             | consumer        date   purcha~t       ddate    sum       mean    SUM        MEAN |
             |----------------------------------------------------------------------------------|
          1. |        1   2020/8/24        300   24aug2020      0          .      .           . |
          2. |        1   2020/8/25        400   25aug2020    300        300    300         300 |
          3. |        1   2020/8/26        600   26aug2020    700        350    700         350 |
          4. |        1   2020/8/27        700   27aug2020   1300   433.3333   1300   433.33333 |
          5. |        1   2020/8/28        900   28aug2020   2000        500   2000         500 |
          6. |        1   2020/8/29       1000   29aug2020   2900        580   2900         580 |
          7. |        1   2020/8/30        200   30aug2020   3900        650   3900         650 |
             |----------------------------------------------------------------------------------|
          8. |        2   2020/8/28       2000   28aug2020      0          .      .           . |
          9. |        2   2020/8/29         21   29aug2020   2000       2000   2000        2000 |
         10. |        2   2020/8/30        300   30aug2020   2021     1010.5   2021      1010.5 |
             +----------------------------------------------------------------------------------+

        Comment

        Working...
        X