Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an Average of the Recent Observations

    Hello all,

    I want to create a variable that will take the average of the 3 most recent observations of dayslate per id. Net_due_date should be used to determine which is the most recent observation. I do not need this to be a rolling average.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(id Net_due_date dayslate)
      2 22309   21
      6 20095    9
     37 20368   68
     73 17480  -10
     85 21253  -15
     85 21839    4
     89 21376   15
    117 21852  -12
    122 18005   13
    139 18747  -22
    197 22521    3
    202 17985  -17
    202 18165  -14
    202 20498  -20
    202 20895  -19
    202 21075  -28
    202 21426  -30
    202 22264  -26
    203 21001   -3
    209 22028   12
    215 17784   27
    230 20875   -4
    230 21334  -30
    230 22614  -27
    233 22195   19
    239 21653   10
    242 18090    9
    243 21014   14
    251 18076   -2
    256 22297   42
    256 21422  -27
    256 17962  -30
    256 20577  -21
    256 18838   -1
    256 21122   10
    256 20930 -365
    256 20849   -1
    256 21050   16
    293 17985  124
    293 19021   -9
    end
    format %td Net_due_date
    Has anyone got any ideas how to go about this?

    Thanks in advance!
    Kayleigh

  • #2
    I don't know what to make of it when you say "I don't need this to be a rolling average" since what you have described is the very definition of a rolling average. So, with that in mind, I may be completely misunderstanding what you want. But ignoring the remark, here is how you would do the rolling average:

    Code:
    by id (Net_due_date), sort: gen seq = _n
    rangestat (mean) dayslate, interval(seq -2 0) by(id)
    Note: I assume that by "three most recent" you mean the current observation and the two preceding ones. If you meant to exclude the current observation and look at the three preceding, change - 2 0 to -3 -1 in the -interval()- option.

    Added: -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    Comment


    • #3
      Code:
      bysort id (Net_due_date) : gen wanted = (dayslate + dayslate[_n-1] + dayslate[_n-2]) / 3
      averages the current value and two previous. There could be extra code for what happens if 1 or 2 of those values are missing. Another complication not considered is what happens if there are two or more observations with the same identifier and due date.

      I call this a rolling average, just in terms of when values are observed.

      EDIT:

      Clyde Schechter used rangestat to do it, which is fine by me. A small difference is that rangestat will not fall over and return missing if one or two values out of three are missing. There is still some arbitrariness possible if dates are tied for the same id.
      Last edited by Nick Cox; 20 Apr 2022, 13:39.

      Comment

      Working...
      X