Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compute moving average for years when time variable is in days

    Dear all,

    I have a seemingly easy question that is giving me some trouble. The following table is a piece of my data set, which is on firm acquisitions. Value transaction is the price paid to acquire a certain firm, acquiror id is a unique id for an acquiror, and date is the date of an acquisition. I would like to calculate the moving average of the transaction value of acquisitions a firm made in the previous 4 years. There seem to be two problems here. The first is that date is recorded in days, and I would like to have the moving average in terms of years. Second, as you can see, not every company made multiple deals. For instance, acquiror_id=919 didn't make any acquisition in the past. For those cases, I would like the moving average to report a value of 0.

    Following the time series manual and other threads, I did the following

    Code:
    tsset acquiror_id date
    tssmooth ma avgvalue4yr=value_transaction, window(1460)
    Note that 1460 corresponds to the number of days of four years 365*4. This code however generated an error which said that 1460 is out of the allowed range. Do you know by any chance how to solve this?

    value_transaction acquiror_id date
    61.1 918 01/01/1994
    1088.367 918 24/06/1994
    1568.947 918 07/03/1996
    1660.504 918 14/06/1997
    86.32 919 18/05/1994
    730 920 26/11/1986
    123.037 920 31/01/1987
    1493.323 920 30/04/1993
    85.9 921 02/01/2001
    109.461 921 03/01/2001
    120.71 921 01/06/2001
    83 921 27/10/2001
    104.154 921 10/05/2003
    1011.336 921 06/12/2005
    72.266 921 02/06/2007
    80.809 922 26/01/2001



    Many thanks!
    Riccardo

  • #2
    It appears that tssmooth ma imposes a limit of just under 1000 for its window() option. This is just a rolling average but it's pretty taxing over such a long window. I was about to announce that tsegen (from SSC) has just been updated to overcome Stata's limit of 100 unique time-series operators in a command. Here's an example how to use it to generate your rolling average

    Code:
    * ------------- data setup -------------
    set seed 1234
    clear
    
    set obs 2
    gen firmid = _n
    
    * for each firm, create 10 years of daily data
    expand 3650
    bysort firmid: gen days = mdy(1,1,2000) + _n
    format %td days
    gen double value = runiform() * days
    
    * not every day is a buying day
    drop if runiform() < .5
    
    * Declare data to be panel data
    tsset firmid days
    
    tempfile main
    save "`main'"
    
    * ------------- end of data setup -------------
    
    * show that -tsegen- replicates -tssmooth-
    tssmooth ma avgvalue0 = value, window(10)
    tsegen avgvalue1 = rowmean(L(1/10).value)
    assert avgvalue1 == avgvalue0
    
    * rolling daily average over one year
    use "`main'", clear
    tsegen double avgvalue1y = rowmean(L(1/365).value)
    tsegen double avgvalue2y = rowmean(L(366/730).value)
    tsegen double avgvalue3y = rowmean(L(731/1095).value)
    tsegen double avgvalue4y = rowmean(L(1096/1460).value)
    
    * over the 4 years
    egen double avgvalue4yr = rowmean(avgvalue1y avgvalue2y avgvalue3y avgvalue4y)
    
    * rolling daily average over 4 years
    tsegen double avgval4years = rowmean(L(1/1460).value)
    I don't know what you mean by wanting the moving average in terms of years. Perhaps its enough to simply collapse the data per year

    Code:
    gen year = year(days)
    collapse (mean) value, by(firmid year)
    tsset firmid year
    tsegen double avgval4years = rowmean(L(1/4).value)

    Comment


    • #3
      Many thanks Robert! That was a great example. The command:

      Code:
      tsegen double avgval4years=rowmean(L(1/1460).value)
      really did what I was expecting. I didn't know tsegen existed. From now on, I will take it into close consideration every time I work with time series data!

      The last chunk of code you suggested didn't precisely apply to my case because the average I was looking for was not based on the calendar year but rather on the 365 preceding a certain date.

      Thank you again!
      Last edited by Riccardo Valboni; 30 May 2015, 05:53.

      Comment


      • #4
        Hi Robert, I suddenly realized that there is a problem when I try to calculate the rolling average for 5 years. Stata gives an error which says that there are too many elements in the numlist. But I read help limits and it says that the maximum number of elements in a numlist should be 2500. While for 5 years they are 1825. Do you think I made something wrong, or might that be a bug?

        Comment


        • #5
          Well you've stumbled on an other obscur limitation associated with numlists. A fix is on its way to Kit Baum. In the mean time, you can circle around the limitation using

          Code:
          tsegen double avgval4years = rowmean(L(1/1600).value L(1601/1825).value)


          Comment


          • #6
            Genius solution! It worked perfectly.

            Comment


            • #7
              A new version of tsegen that fixes the problem mentioned in #4 is now available from SSC. To update, type in Stata's command window:

              Code:
              adoupdate tsegen, update
              To install tsegen from scratch, type

              Code:
              ssc install tsegen

              Comment


              • #8
                Robert, thanks to you and Nick for creating and maintaining -tsegen-. It is a great solution to so many of the problems that are brought to this forum. My only worry is that it's so useful, if it catches on, I fear it will put this forum out of business!

                Comment


                • #9
                  Thanks for the kind words, gives me warm fuzzies as my daughter would say.

                  Comment

                  Working...
                  X