Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative rolling sum over fixed time period in unbalanced panel

    I have an unbalanced panel dataset, potentially with gaps. Time is coded as daily, but is really monthly (there is always just the first day of each month).

    Here is an MWE with two units (id) over time periods (t) and the variable of interest (x)

    Code:
    input id str16 t x
    1    "01jan2002" 1
    1   "01feb2002" 1
    1   "01mar2002" 0
    1   "01aug2002" 1
    2    "01jan2002" 1
    2   "01feb2002" 1
    2   "01may2002" 0
    2   "01jun2002" 1
    end
    I want to create the cumulative rolling window sum of x starting at period t and then moving a fixed number of months forward. Let's say 3 months here. The issue is that I may have gaps and I think it would be better to identify the end based on the date variable rather than using the _n+3 observation.

    Somebody has an idea how to tackle this?


  • #2
    I gather that what you want is just a rolling sum with 3 year windows. Whatever you do, you need a numeric date variable. Here is an approach based on rangestat from SSC.

    Code:
    clear
    input id str16 t x
    1    "01jan2002" 1
    1   "01feb2002" 1
    1   "01mar2002" 0
    1   "01aug2002" 1
    2    "01jan2002" 1
    2   "01feb2002" 1
    2   "01may2002" 0
    2   "01jun2002" 1
    end
    
    gen mdate = monthly(substr(t, 3, .), "MY")
    format mdate %tm
    
    rangestat (count) count=x (sum) sum=x, int(mdate -2 0) by(id)
    
    list, sepby(id)
    
         +-------------------------------------------+
         | id           t   x    mdate   count   sum |
         |-------------------------------------------|
      1. |  1   01jan2002   1   2002m1       1     1 |
      2. |  1   01feb2002   1   2002m2       2     2 |
      3. |  1   01mar2002   0   2002m3       3     2 |
      4. |  1   01aug2002   1   2002m8       1     1 |
         |-------------------------------------------|
      5. |  2   01jan2002   1   2002m1       1     1 |
      6. |  2   01feb2002   1   2002m2       2     2 |
      7. |  2   01may2002   0   2002m5       1     0 |
      8. |  2   01jun2002   1   2002m6       2     1 |
         +-------------------------------------------+
    Last edited by Nick Cox; 20 Dec 2021, 03:33.

    Comment


    • #3
      Ah it does it based on dates? I thought it was based on _n. But here 2002m5 has correctly 0 because of the gap.

      Comment


      • #4
        This was a mistake... edited.

        Comment


        • #5
          rangestat takes the interval() arguments literally. That's the point. It doesn't use previous (or present, or following) observations unless they fall within the specified window.

          The help is quite detailed and there are now many posts here giving further examples.

          Comment

          Working...
          X