Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running cumulative sum

    Dear Statalist users,

    I am using panel data, have used txset and the date is in %td format.

    I have several IDs and for each ID 365 days of data. I have a variable "T74" which contains only zero's or ones for each ID for each day.
    I want to make a running sum of this variable "T74" over t until t+6 for each ID on each date.

    So for the 7 days running sum I have tried the following:
    Code:
    gen Sum_T74 = sum(T74) + sum(f.T74) + sum(f2.T74) + sum(f3.T74) + sum(f4.T74) + sum(f5.T74) + sum(f6.T74)
    gen Run_SumT74 = s6.Sum_T74
    However, it gives me the error:
    too many sum() functions
    , as I also need to make a running sum of 20 days I need a more clever way to code my problem.

    Is there a nicer way to code my problem?

    Thanks in advance,

    Emilio





  • #2
    Emilio Emilio: My guess is that Emilio is your given name. Please note our longstanding request to use full real names, so typically a given name and family name. If Emilio is your family name, then clearly you're in order. See https://www.statalist.org/forums/help#realnames

    One way to do this is with rangestat (SSC), which you must install before you can use it. One of the small virtues of rangestat is that it is easy to keep track of how many values are included in the sum. You can search the forum for other mentions of this community-contributed command.

    Here, as I understand it, you want first the sum of the current value and the next 6 values.

    You don't give a data example (contrary to FAQ Advice #12: see https://www.statalist.org/forums/help#stata), so here is a silly one as sandbox.

    Code:
    clear
    set obs 20
    gen id = ceil(_n/10)
    gen date = cond(id == 1, _n, _n - 10)
    tsset id date
    gen y = _n
    list, sepby(date)
    
    rangestat (sum) sum=y (count) count=y, by(id) int(date 0 6)
    
    list, sepby(id)
    
         +------------------------------+
         | id   date    y   sum   count |
         |------------------------------|
      1. |  1      1    1    28       7 |
      2. |  1      2    2    35       7 |
      3. |  1      3    3    42       7 |
      4. |  1      4    4    49       7 |
      5. |  1      5    5    45       6 |
      6. |  1      6    6    40       5 |
      7. |  1      7    7    34       4 |
      8. |  1      8    8    27       3 |
      9. |  1      9    9    19       2 |
     10. |  1     10   10    10       1 |
         |------------------------------|
     11. |  2      1   11    98       7 |
     12. |  2      2   12   105       7 |
     13. |  2      3   13   112       7 |
     14. |  2      4   14   119       7 |
     15. |  2      5   15   105       6 |
     16. |  2      6   16    90       5 |
     17. |  2      7   17    74       4 |
     18. |  2      8   18    57       3 |
     19. |  2      9   19    39       2 |
     20. |  2     10   20    20       1 |
         +------------------------------+
    One check is for the first 7 observations: the values are 1 2 3 4 5 6 7 and their mean is clearly 4; hence the total is 28.

    The case of 20 is evidently just the same idea.

    You can do this with time series operators and sum(), but what you want is the difference between two cumulative sums, not the sum of 7 cumulative sums (or 20, as the case may be).
    Last edited by Nick Cox; 23 Jan 2018, 05:09.

    Comment


    • #3
      I'd flag also that a moving sum is just a moving average multiplied by the number of elements used, so moving average commands are another possibility. You just need to watch out at one or both ends of the panels when you don't have a complete set to play with.

      Comment


      • #4
        Thanks a lot!

        Rangestat really helped me,
        Code:
        rangestat (sum) sum=T74, by(ID) int(date 0 6)
        worked like a charm.

        On the Emilio Emilio part, it is true that my surname is not Emilio. I was not aware that it is common to use your full name on this forum and as I am very conscious where on the internet my name appears, I almost never use my full name. Sorry for that.

        Your help is greatly appreciated!


        Thanks,

        Emilio Emilio

        Comment


        • #5
          Emilio Emilio : to be aware of this topic ("Why are real names preferred"?) and to get acquainted with the recommendations overall, please take a look at the FAQ, here.
          Best regards,

          Marcos

          Comment


          • #6
            Noted, but the home page and every prompt when you post something ask you to read the FAQ Advice, so please do so before your next question.

            Comment


            • #7
              Emilio:
              about the "full real names" issue see also https://www.statalist.org/forums/for...for-real-names
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                The thread cited by Carlo is an exchange of personal views.

                https://www.statalist.org/forums/help#adviceextras #3 in slight contrast can be regarded as standing advice.

                Preference for full real names remains a request, and not a rule. But those new to the forum should want to know that many active people feel quite strongly about it.

                Nothing about a fake identifier makes someone behave badly, but across several forums which I post in (or watch) I've seen more bad behaviour whenever anonymous or cryptic identifiers are allowed.

                This is a statistical list and people should be capable of thinking in probability terms. If you make it obvious that your identifier is not a real name, detailed help is less probable. No more, no less.

                The biggest argument for anonymity is that it serves no individual well if they feel humiliated by asking a silly question in public. Equally the forum as a whole is better served by some incentives to ask good questions.
                Last edited by Nick Cox; 23 Jan 2018, 07:29.

                Comment

                Working...
                X