Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate a new variable that is the cumulative average of an existing variable in a time-series cross-section dataset

    Dear list members:

    I would like to learn how to generate a new variable that is the cumulative average of an existing variable in a time-series cross-section dataset. In the following example, I would like to create "cum average" of the fourth column, which is the cumulative average of the 3rd-column variable "amount." There was one thread about a cumulative average, but it didn't look like it is what I'm looking for.
    country id year amount cum average
    1 1994 1 1
    1 1995 3 2
    1 1996 3 2.3333
    1 1997 4 2.75
    1 1998 5 3.2
    2 1994 2 2
    2 1995 4 3
    2 1996 2 2.6666
    2 1997 4 3
    2 1998 2 2.8
    3 1994 2 2
    3 1995 1 1.5
    3 1996 3 2
    3 1997 4 2.5
    3 1998 2 2.4

    I thank you in advance for your kind assistance.

    Best wishes,

    Taka

  • #2
    The cumulative average is just the cumulative sum divided by the number of values so far.

    Code:
    bysort id (year) : gen cummean = sum(amount)/ _n
    The cumulative sum function sum() ignores missings, or equivalently regards them as adding 0 to the sum. If you have missings, you need to adjust the count. This version of the above is more general:

    Code:
    bysort id (year) : gen cummean = sum(amount)/sum(amount < .)
    I can't comment on a thread you don't reference.

    Comment


    • #3
      Thank you, Nick (if I may) for showing me how. I didn't imagine it'd be so simple.

      So in your second line command, the denominator "sum(amount<.)" means "_n minus the number of observations with missing values"? You do not have to have _n in the denominator?

      I apologize in advance if I'm slow.

      Thank you very much for your kind help.

      Best wishes,

      Taka

      Comment


      • #4
        amount < .

        is a true or false expression evaluated as 0 or 1. You can see this (e.g.) by

        Code:
        gen OK = amount < .
        list OK
        The idea is just how do we count? As people we have learned to count children, copies of Stata Press books we own, number of Games of Thrones series watched, by 1, 2, 3, and so forth. To get a program to do that, we have to instruct it to go 1 + 1 + 1 and so forth. If there are irrelevant cases not to be included we have to instruct it to go 1 + 1 + 1 + 0 + 1 + 1 + 1 and so forth, with the rule "irrelevant cases must be marked 0".

        See also http://www.stata.com/support/faqs/da...lse/index.html on true and false in Stata.

        P.S. Nick is fine by me. It's my name. Well, my nickname, but that's OK.

        Comment


        • #5
          Thank you again, Nick. I've looked at the page you wrote at the link, and it all makes sense now. May I ask one last question? The command you wrote,
          bysort id (year) : gen cummean = sum(amount)/sum(amount < .) adjusts for the number of observations with missing values and calculates the cumulative average correctly. But it creates "cummean" values for observations with missing "amount" values. Is there a command you can add to the line to not create values for those, or you just deal with them in another line with commands like drop? I'm not sure if I need this, but thought I should learn.

          Thank you very much again for your kind help.

          Best wishes,

          Taka

          Comment


          • #6
            You could always just fix missings in that way, e.g. by

            Code:
            replace cummean = . if missing(amount)
            A way to get everything in one

            Code:
            bysort country (year) : gen cummean = cond(missing(amount), ., sum(amount)/sum(amount <  .))
            search cond to get information about that function.

            Comment


            • #7
              Thank you for all your kind help, Nick. I've learned a lot.

              Thank you for making time for this stranger.

              Best wishes,

              Taka

              Comment


              • #8
                Thanks. As your name is evidently Taka, please consider changing your registration to reflect your full real name.

                Comment


                • #9
                  I will!

                  Taka Sakamoto

                  Comment

                  Working...
                  X