How to generate a new variable that is the cumulative average of an existing variable in a time-series cross-section dataset

halosakamoto

Join Date: Apr 2014
Posts: 10

How to generate a new variable that is the cumulative average of an existing variable in a time-series cross-section dataset

12 Apr 2014, 03:07

Dear list members:

I would like to learn how to generate a new variable that is the cumulative average of an existing variable in a time-series cross-section dataset. In the following example, I would like to create "cum average" of the fourth column, which is the cumulative average of the 3rd-column variable "amount." There was one thread about a cumulative average, but it didn't look like it is what I'm looking for.

country id	year	amount	cum average
1	1994	1	1
1	1995	3	2
1	1996	3	2.3333
1	1997	4	2.75
1	1998	5	3.2
2	1994	2	2
2	1995	4	3
2	1996	2	2.6666
2	1997	4	3
2	1998	2	2.8
3	1994	2	2
3	1995	1	1.5
3	1996	3	2
3	1997	4	2.5
3	1998	2	2.4

I thank you in advance for your kind assistance.

Best wishes,

Taka

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

12 Apr 2014, 03:23

The cumulative average is just the cumulative sum divided by the number of values so far.

Code:

bysort id (year) : gen cummean = sum(amount)/ _n

The cumulative sum function sum() ignores missings, or equivalently regards them as adding 0 to the sum. If you have missings, you need to adjust the count. This version of the above is more general:

Code:

bysort id (year) : gen cummean = sum(amount)/sum(amount < .)

I can't comment on a thread you don't reference.
1 like
Comment
halosakamoto

Join Date: Apr 2014

Posts: 10
#3

12 Apr 2014, 04:30

Thank you, Nick (if I may) for showing me how. I didn't imagine it'd be so simple.

So in your second line command, the denominator "sum(amount<.)" means "_n minus the number of observations with missing values"? You do not have to have _n in the denominator?

I apologize in advance if I'm slow.

Thank you very much for your kind help.

Best wishes,

Taka
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

12 Apr 2014, 04:41

amount < .

is a true or false expression evaluated as 0 or 1. You can see this (e.g.) by

Code:

gen OK = amount < . list OK

The idea is just how do we count? As people we have learned to count children, copies of Stata Press books we own, number of Games of Thrones series watched, by 1, 2, 3, and so forth. To get a program to do that, we have to instruct it to go 1 + 1 + 1 and so forth. If there are irrelevant cases not to be included we have to instruct it to go 1 + 1 + 1 + 0 + 1 + 1 + 1 and so forth, with the rule "irrelevant cases must be marked 0".

See also http://www.stata.com/support/faqs/da...lse/index.html on true and false in Stata.

P.S. Nick is fine by me. It's my name. Well, my nickname, but that's OK.
Comment
halosakamoto

Join Date: Apr 2014

Posts: 10
#5

12 Apr 2014, 05:39

Thank you again, Nick. I've looked at the page you wrote at the link, and it all makes sense now. May I ask one last question? The command you wrote,
bysort id (year) : gen cummean = sum(amount)/sum(amount < .) adjusts for the number of observations with missing values and calculates the cumulative average correctly. But it creates "cummean" values for observations with missing "amount" values. Is there a command you can add to the line to not create values for those, or you just deal with them in another line with commands like drop? I'm not sure if I need this, but thought I should learn.

Thank you very much again for your kind help.

Best wishes,

Taka
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

12 Apr 2014, 05:49

You could always just fix missings in that way, e.g. by

Code:

replace cummean = . if missing(amount)

A way to get everything in one

Code:

bysort country (year) : gen cummean = cond(missing(amount), ., sum(amount)/sum(amount < .))

search cond to get information about that function.
Comment
halosakamoto

Join Date: Apr 2014

Posts: 10
#7

12 Apr 2014, 08:46

Thank you for all your kind help, Nick. I've learned a lot.

Thank you for making time for this stranger.

Best wishes,

Taka
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

12 Apr 2014, 10:58

Thanks. As your name is evidently Taka, please consider changing your registration to reflect your full real name.
Comment
halosakamoto

Join Date: Apr 2014

Posts: 10
#9

12 Apr 2014, 18:29

I will!

Taka Sakamoto
Comment

Announcement

How to generate a new variable that is the cumulative average of an existing variable in a time-series cross-section dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment