Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable that averages across the last 30 days

    Hi -

    I'm interested in looking at an average quantity (of AvgOfWeight) over the past month (Date is the time variable) -- not by month. This value would thus address the fact that the most recent 30 days on Apr 3 is a more relevant 'month' than the average across April.

    To do this, I want to average over the most recent 30 days. The following code works to generate the solution, but I've seen previous recommendations that using a loop over observations is not a great practice. Could anyone suggest another, and ideally more efficient, way to generate the same result?

    Many thanks for your help!
    Katie


    gen lastMonth = .

    local N = _N
    forvalues i = 1/`N' {
    quietly summarize AvgOfWeight if (Date <= Date[`i'] & Date >= Date[`i']-30)
    replace lastMonth = r(mean) in `i'
    }


  • #2
    It is indeed rarely the case that you need to loop over observations. This is still a bit tricky to do because you want a rolling window. Here's a bunch of ways to do this if you have daily data (I'm showing a 7 day window):

    Code:
    set seed 1234
    clear
    
    * create 5 patients at an initial weight
    set obs 5
    gen id = _n
    gen w = 100 + 100 * runiform() 
    
    * for each patient, create 2 years of daily observations
    expand 730
    bysort id: gen mydate = mdy(1,1,2000) + _n
    format %td mydate
    
    * Declare data to be panel data
    xtset id mydate
    
    * weight varies somewhat at each time measured
    bysort id mydate: replace w = w + runiform() 
    
    * use time-series varlist to create an expression of variables to sum
    tsunab vsum : L(0/6).w
    local vsum : subinstr local vsum " " "+", all
    dis "`vsum'"
    gen avgw7 = (`vsum') / 7
    
    * redo using a loop over lagged variables
    gen double xsum = w
    forvalues i = 1/6 {
        replace xsum = xsum + L`i'.w
    }
    gen avgw7b = xsum / 7
    
    * redo using a loop over lagged variables and control for missing values
    gen double xsum2 = w
    gen n = 1
    forvalues i = 1/6 {
        replace n = n + 1 if !mi(L`i'.w)
        replace xsum2 = xsum2 + L`i'.w if !mi(L`i'.w)
    }
    gen avgw7c = xsum2 / n
    
    * redo using a running sum
    by id: gen double wsum = sum(w)
    gen avgw7d = (wsum - L6.wsum + L6.w) / 7
    
    assert avgw7 == avgw7b
    assert avgw7 == avgw7c if !mi(avgw7)
    assert avgw7 == avgw7d

    Comment


    • #3
      See this post


      Perfect timing. Today I completed writing a program asrol for calculating statistics over a rolling window . It can be installed from Code:

      net from "https://sites.google.com/site/imspeshawar"
      OR you may directly download it from this post and copy it to C:/ado/personal
      It similar to rolling and mvsumm, however, mvsumm does not work on unbalanced panel and rolling changes structure of the data in memory ( I assume), whereas asrol does not suffer from any of these issue. The help files are yet to be written, here is the basic structure with example.

      1. For Rolling Mean

      Code:

      webuse grunfeld asrol invest, stata(mean) win(4) gen(mean_4)



      This command calculates mean for the variable invest using a four years rolling window and stores the results in a new variable called mean_4


      2. For Rolling standard deviation

      Code:

      asrol invest, stat(sd) win(5) gen(sd_5)


      This command calculates standard deviation for the variable invest using a five years rolling window and stores the results in a new variable called sd_5

      3. What Else in a rolling window



      asrol can find any statistics in a rolling window that can be calculated with egen function. Specifically, the following statistics can be calculated with asrol:
      • sd for standard deviation
      • mean for mean
      • skew for skewness
      • total for sum or total
      • median
      • mdev for mean absolute deviation
      • kurt for kurtosis
      Acknowledgement:
      asrol program is motivated by the mvsumm program written by Nick Cox and Kit Baum. mvsumm is more powerful than asrol, with only limitation that mvsumm does not work on data with gaps i.e. unbalanced data
      Regards
      --------------------------------------------------
      Attaullah Shah, PhD.
      Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
      FinTechProfessor.com
      https://asdocx.com
      Check out my asdoc program, which sends outputs to MS Word.
      For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

      Comment


      • #4
        Attaullah,

        asrol ignores missing values and missing observations but yet generates missing values at the beginning of each panel. Also, it's pretty inefficient compared to the techniques I suggested. Here's a reworked example to illustrate this

        Code:
        * ------------- data setup -------------
        set seed 1234
        clear
        
        * create patients at an initial weight
        set obs 100
        gen id = _n
        gen w = 100 + 100 * runiform() 
        
        * for each patient, create 2 years of daily observations
        expand 730
        bysort id: gen mydate = mdy(1,1,2000) + _n
        format %td mydate
        
        * weight varies somewhat at each time measured
        bysort id mydate: replace w = w + runiform() 
        
        * create some missing  values
        replace w = . if runiform() < .01
        
        * can also have some missing obs
        drop if runiform() < .01
        
        * Declare data to be panel data
        xtset id mydate
        
        * ------------- end of data setup -------------
        
        local window 30
        
        timer clear
        
        * loop over lagged variables and control for missing values
        timer on 1
        gen double wsum = w
        gen n = 1
        forvalues i = 1/`=`window'-1' {
            qui replace n = n + 1 if !mi(L`i'.w)
            qui replace wsum = wsum + L`i'.w if !mi(L`i'.w)
        }
        gen wmean = wsum / n
        timer off 1
        
        timer on 2
        asrol w, stat(mean) win(`window') gen(mean_`window') 
        timer off 2
        
        timer list
        assert mean_`window'== wmean if !mi(wmean,mean_`window')

        Comment


        • #5
          Turns out that there's a tsrevar command that converts a time-series varlist (see help tsvarlist) into a list of temporary variables. This makes it super simple to calculate statistics on a rolling window using the set of egen row functions (and it is super fast!) Here's a reworked example that shows how to calculate the mean and standard deviation over a window of 30 days.

          Code:
          * ------------- data setup -------------
          set seed 1234
          clear
          
          * create patients at an initial weight
          set obs 5
          gen id = _n
          gen w = 100 + 100 * runiform()
          
          * for each patient, create 2 years of daily observations
          expand 730
          bysort id: gen mydate = mdy(1,1,2000) + _n
          format %td mydate
          
          * weight varies somewhat at each time measured
          bysort id mydate: replace w = w + runiform()
          
          * create some missing  values
          replace w = . if runiform() < .01
          
          * can also have some missing obs
          drop if runiform() < .01
          
          * Declare data to be panel data
          xtset id mydate
          
          
          * ------------- end of data setup -------------
          
          * This is probably the simplest way to generate statistics over a rolling window
          
          local window 30
          local lastp = `window' - 1
          tsrevar L(1/`lastp').w
          local rollingvars `r(varlist)'
          
          egen rmean = rowmean(w `rollingvars')
          egen rsd = rowsd(w `rollingvars')
          Last edited by Robert Picard; 16 Apr 2015, 16:37.

          Comment


          • #6
            Robert Picard Thanks for the code. One query here, why your code generates statistics from the second observation whereas the required window is thirty days? don't you think this is against the spirit of the required window i.e. in your example 30 days?
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment


            • #7
              On a similar topic started after this one, Robert Picard improved his code, packaging it as a program that allows the user to specify a minimum number of observations needed to compute the rolling statistics, addressing the issue Attaullah Shah correctly raises.

              Comment


              • #8
                Attullah, your asrol command does not account for missing observations. It will happily report a statistic based on a single observation. It's not logical to return missing values at the start of the window when you allow observations to be missing later on. Consider the following simpler example:

                Code:
                clear
                input year x
                1980 1
                1981 2
                1990 3
                2010 4
                2015 5
                end
                
                gen id = 1
                
                tsset id year
                
                tsrevar L(1/4).x
                local rollingvars x `r(varlist)'
                egen rmean = rowmean(`rollingvars')
                
                asrol x, stat(mean) win(5) gen(rmean2)
                which generates the following

                Code:
                       year   x   id   rmean   rmean2  
                  1.   1980   1    1       1        .  
                  2.   1981   2    1     1.5        .  
                  3.   1990   3    1       3        3  
                  4.   2010   4    1       4        4  
                  5.   2015   5    1       5        5

                Comment


                • #9
                  The problem is not with asrol, it is how Stata declares panel data. See, if we strictly want the rolling statistics in a chronological order of time, then both of the above answers are wrong. All the mean values should be missing as the time gaps suggests. But if the user still wants rolling statistics, then the workaround is

                  Code:
                  sort id year
                  bys id: gen t=_n
                  tsset id t
                  
                  asrol x, stat(mean) win(3) gen(rmean2)
                  Also, if a user wants rolling statistics in a window of 5, why would your codes results for the first five observations where the minimum length of window is less than 5.
                  Regards
                  --------------------------------------------------
                  Attaullah Shah, PhD.
                  Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                  FinTechProfessor.com
                  https://asdocx.com
                  Check out my asdoc program, which sends outputs to MS Word.
                  For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                  Comment


                  • #10
                    Attaullah:

                    I tried downloading asrol but found this problem:

                    Code:
                     
                    . net from "https://sites.google.com/site/imspeshawar"
                    file https://sites.google.com/site/imspeshawar/stata.toc not found
                    server says file temporarily redirected but did not give new location
                    https://sites.google.com/site/imspeshawar/ either
                      1)  is not a valid URL, or
                      2)  could not be contacted, or
                      3)  is not a Stata download site (has no stata.toc file).
                    Please advise.

                    Comment


                    • #11
                      I have checked , this problem arises when you have Stata below version 12. In Stata 13, this problem does not arise. Anyway, asrol is attached. The help file is given on http://www.opendoors.pk/STATA/standa...rolling-window
                      Attached Files
                      Regards
                      --------------------------------------------------
                      Attaullah Shah, PhD.
                      Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                      FinTechProfessor.com
                      https://asdocx.com
                      Check out my asdoc program, which sends outputs to MS Word.
                      For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                      Comment


                      • #12
                        Thanks, but note that I experienced the problem using Stata 14.

                        Comment


                        • #13
                          Summary

                          What follows is a variety of comments on asrol, Attaullah's program discussed earlier in this thread, and on comparisons with tsegen (SSC; Robert Picard and Nicholas J. Cox) and mvsumm (also SSC; Nicholas J. Cox and Christopher F. Baum).

                          This may be of interest to those interested in rolling calculations, whether for individual time series or for panel data.


                          Preamble

                          Thanks again to Attaullah for posting asrol (post #11). It's my understanding that the web page there cited is the documentation publicly available for asrol, i.e. there is no public Stata help file, strict sense. As it can get too complicated to keep looking at different places, I copy here below the important parts of the help (minus download instructions).

                          Rolling Window Standard Deviation and Mean in Stata

                          In panel data, we often need to find standard deviation or mean or any
                          other statistics in rolling window of say 4 or 5 years.

                          The program requires the data to be declared as panel data, which can be
                          done with

                          tsset id year

                          Where id is panel or firm identifier and year is time identifier. To use
                          asrol program, we shall use an example data of grunfeld , type:

                          webuse grunfeld

                          Following are example for different statistics calculations:

                          Example 1. For Rolling Mean

                          asrol invest, stat(mean) win(4) gen(mean_4)

                          This command calculates mean for the variable invest using a four years
                          rolling window and stores the results in a new variable called mean_4.

                          Example 2. For Rolling standard deviation

                          asrol invest, stat(sd) win(6) gen(sd_6)

                          This command calculates standard deviation for the variable invest using
                          a six years rolling window and stores the results in a new variable
                          called sd_6

                          Example 3. For Rolling Mean with no missing values

                          asrol invest, stat(mean) win(4) gen(sd_4) n

                          This command calculates mean for the variable invest using a four years
                          rolling window and stores the results in a new variable called mean_4.
                          The n option forces asrol to find mean with all available observation,
                          which results in no missing values at the start of the panel. Compare
                          results where n is not used in example 1 above. In the example 1, asrol
                          finds mean starting with the fourth observation of each panel, i.e. the
                          rolling window does not start working unless it reaches the required
                          level of 4 observations.

                          Example 4. What Else in a rolling window

                          asrol can find any statistics in a rolling window that can be calculated
                          with egen function. Specifically, the following statistics can be
                          calculated with asrol:

                          sd for standard deviation
                          mean for mean
                          skew for skewness
                          total for sum or total
                          median
                          mdev for mean absolute deviation
                          kurt for kurtosis

                          Acknowledgement:

                          asrol program is motivated by the mvsumm program written by Nick Cox and
                          Kit Baum. mvsumm is more powerful than asrol, with only limitation that
                          mvsumm does not work on data with gaps i.e. unbalanced data
                          Code

                          To have everything in one place here is the code as posted:

                          Code:
                          *! Attaullah Shah 2.0.0 June 2015
                          *Email: [email protected]
                          cap prog drop asrol
                          prog def asrol
                              
                              syntax varname(ts), Generate(str) Stat(str) [ Window(int 3) Nomiss]
                              
                              
                              qui tsset
                              local id `r(panelvar)'
                              local timevar `r(timevar)'
                              loc min `r(tmin)'
                              loc max `r(tmax)'
                              loc N `max'-`min'
                              loc nmiss : word count `nomiss'
                              loc cid  "`id'"
                              if "`cid'"==""{
                              loc idexist " "
                              }
                              if "`cid'"!=""{
                              loc idexist "bys `id': "
                              }
                              if `window' < 2 {
                                 di as err "Rolling window length must be at least 2"
                              exit 198
                              }
                              
                              if `window' >= `N' {
                                 di as err "Rolling window length should be less than total number of periods"
                              exit 198
                              }
                                  
                              local nstats : word count `stat'
                              if `nstats' > 1 {
                                  di as error "Only one statistic allowed"
                              exit 0
                              }    
                              local gen "`generate'"
                              confirm new variable `gen'
                              
                              
                          
                              local upper = `window' - 1
                              tsrevar L(1/`upper').`varlist'
                              local varlags `r(varlist)'
                          
                              egen `gen' = row`stat'(`varlist' `varlags')
                              
                              if `nmiss'==0{
                              qui `idexist' replace `gen'=. if _n<`window'
                              }
                              dis as txt "Desired statistics successfully generated"
                              
                          end
                          That's I think the raw material needed to follow these comments.

                          Comments

                          1. asrol in version 2 above uses an approach suggested independently in tsegen of inducing Stata to create temporary variables for lagged values and then firing up egen with some specified row function.

                          2. However, asrol's idea of a window of length k (say) is only that of the present value and the previous (k - 1) lagged values. tsegen allows users to specify any set of previous, present and future values desired.

                          3. Rewriting asrol in this way breaks its declared support for calculation of skewness, kurtosis and mean deviation as there are no official egen functions rowskew, rowkurt, rowmdev; for a call such as stat(skew) to work, a user would need to (write and) install an egen function rowskew(), and so on. More positively, rewriting gives scope for those statistics which are supported by functions whose names begin with row, as Attaullah could now document. See http://www.stata.com/help.cgi?egen

                          4. asrol does not support specification of if and/or in qualifiers, which is a possibility for the future.

                          5. asrol kindly acknowledges in its documentation inspiration from mvsumm. However, what is said about mvsumm is a little confused. Here is an extract from its help

                          Although mvsumm works with unbalanced panels (where the start and/or end points differ across units), mvsumm does not allow gaps within the observations of a time series; that is, the value of an observation for a given period may be missing, but the observation itself must be defined. Gaps in time series may be dealt with via the tsfill command.
                          That is, the advertised limitation of mvsumm is nothing to do with unbalanced panels, which it does support.

                          6. The specific question raised by Robert Picard on 18 April in #8 and answered by Attaullah in #9 on 26 June is now complicated by a change of behaviour in version 2 of asrol

                          This is what happens now with Robert's example:

                          Code:
                          . clear
                          
                          . input year x
                                    year          x
                            1. 1980 1
                            2. 1981 2
                            3. 1990 3
                            4. 2010 4
                            5. 2015 5
                            6. end
                          
                          . gen id = 1
                          
                          . tsset id year
                                 panel variable:  id (strongly balanced)
                                  time variable:  year, 1980 to 2015, but with gaps
                                          delta:  1 unit
                          
                          . tsrevar L(1/4).x
                          
                          . local rollingvars x `r(varlist)'
                          
                          . egen rmean = rowmean(`rollingvars')
                           
                          . asrol x, stat(mean) win(5) gen(rmean2)
                          Desired statistics successfully generated
                          
                          . l
                          
                               +--------------------------------+
                               | year   x   id   rmean   rmean2 |
                               |--------------------------------|
                            1. | 1980   1    1       1        . |
                            2. | 1981   2    1     1.5        . |
                            3. | 1990   3    1       3        . |
                            4. | 2010   4    1       4        . |
                            5. | 2015   5    1       5        5 |
                               +--------------------------------+
                          For the record, the approach used in tsegen is precisely that used to create rmean above. Equivalent syntax could be

                          Code:
                          . tsegen rmean_tsegen = rowmean(L(0/4).x)
                          As the code for asrol shows, its default is to set results to missing in the first (k - 1) observations (of the data or of each panel), regardless of any previous calculations.

                          Part of the point of Robert's example is that the last reported mean for 2015 is based on just one value, as there are no values in the data for 2011-2014, but it is reported any way. Hence if the intent is to suppress results for incomplete windows, it is not quite matched by the code.

                          I will leave that there, except for a more general comment that it's a good idea to document how many values each result is based on, and both asrol and tsegen can calculate the number of non-missing values in each window: in each case you would need to call the program again to get those numbers into a new variable.

                          Summary

                          asrol as posted by Attaullah breaks its own documentation in so far as calculation of skewness, kurtosis and mean deviation are no longer supported, but it also can do more than is advertised, which is good news.

                          Otherwise it now in essence offers a subset of tsegen's functionality, except for a different default as documented in point 6 above (and behaviour differing from its own version 1).

                          .






                          Last edited by Nick Cox; 29 Jun 2015, 05:54.

                          Comment


                          • #14
                            Someone just liked this, which is gratifying, but it should not be assumed without checking that all it says is still true nearly 5 years on. Namely, asrol has been modified several times since.

                            Comment

                            Working...
                            X