Calculating a mean of previous values within a time period

Pat Reilly

Join Date: Jul 2015

Posts: 3
#1

Calculating a mean of previous values within a time period

16 Jul 2015, 17:22

Hello, all!

I have an unbalanced panel dataset from Internet Movie Database. I am trying to compute the mean gross of a filmmaker's movies over the three years prior to the release of a particular film. Here is a sample of the dataset:

title talent fdate loggedadjgross
A Fine Mess (1986) Adams, Tony (I) 08aug1986 15.32454586029
That's Life! (1986) Adams, Tony (I) 10sep1986 14.93389987946
Blind Date (1987) Adams, Tony (I) 24mar1987 17.1725769043
Sunset (1988) Adams, Tony (I) 29apr1988 14.98368549347
Skin Deep (1989) Adams, Tony (I) 03mar1989 16.37933731079
Switch (1991) Adams, Tony (I) 10may1991 16.04848480225
Son of the Pink Panther (1993) Adams, Tony (I) 27aug1993 14.15256595612
Shrek (2001) Adamson, Andrew 22apr2001 18.6501750946
Shrek 2 (2004) Adamson, Andrew 15may2004 19.07325172424
The Chronicles of Narnia: The Lion the Witch and the Wardrobe (2005) Adamson, Andrew 07dec2005 18.62376976013
Shrek the Third (2007) Adamson, Andrew 06may2007 18.66974639893
Ballast (2008) Adamson, Andrew 19jan2008 10.32473182678
The Chronicles of Narnia: Prince Caspian (2008) Adamson, Andrew 09may2008 17.80108451843
Shrek Forever After (2010) Adamson, Andrew 21apr2010 18.32175827026
The Chronicles of Narnia: The Voyage of the Dawn Treader (2010) Adamson, Andrew 02dec2010 17.49599838257
Puss in Boots (2011) Adamson, Andrew 23oct2011 17.82677841187
Cirque du Soleil: Worlds Away (2012) Adamson, Andrew 20oct2012 15.31925678253

So, for example, for the observation for Adamson's Balast, I want to calculate the mean of his returns for Shrek the Third and Chronicles of Narnia, but not the projects before those or after Balast. I would like to define the three-year window based upon the actual three-years prior to the release of the film (in the example, January 18, 2005 to January 18, 2008).

As a newcomer to working with Stata, the steps surrounding might be quite elementary. So, I apologize for the potential low degree of difficulty. Thanks in advance!

Addendum: For people with shorter than three year careers, the resulting figure would cover all of their prior output. The prior three years is the limit for this window.

Last edited by Pat Reilly; 16 Jul 2015, 18:08.
Tags: None
Christos Makridis

Join Date: Nov 2014

Posts: 157
#2

16 Jul 2015, 18:14

Hey Pat

Sounds like you want to do something like:

sort movieid firm
bysort firm: gen previousyrs=(sales[_n-1] + sales[_n-2] + sales[_n-3])/3

although that should do it, check two by hand to make sure
Comment
Pat Reilly

Join Date: Jul 2015

Posts: 3
#3

16 Jul 2015, 18:44

Hey, Christos!

Thanks for your suggestion! Though that is approach that I took for a filmmaker's prior three films, I have trouble doing something like this that is inclusive of their total work over a prior time period. The main problem is that there is incredible variance in how prolific people are. There are some people (particularly film producers) that will work on multiple films in a year and, at times, take a year off. However, for certain directors, they may do just one film within a three year window. Also, the individual "windows" within an individual's respective group of films are frequently not discrete; they can overlap. What I would like to do is to have an average that weighs all of the films a person did in the previous three years to a particular film equally.

Apologies if my previous post was a bit unclear.

Last edited by Pat Reilly; 16 Jul 2015, 18:48.
Comment
Noah Grand

Join Date: Jul 2015

Posts: 2
#4

16 Jul 2015, 19:56

Hi Pat & Christos,

Yeah, the key is to get the time period in a way where stata can easily calculate what going back 3 years is in a loop. Since the units here are actor + film, 3 years could be no work for many actors and a bunch of roles for Samuel L. Jackson.

1) I'd start by converting fdate from dmy format to a single number, so stata can calculate going 365*3 days back. I recommend creating a new variable, treating the first day of observations as 1 and counting from there. (I had to do something similar to create 15 minute time periods for months of data scraping.) Let's call it ndate.

2) sort talent ndate

3) gen filmdummy=1

This will eventually be used in the denominator to get an average per film.

4) create a forvalues loop. Ideally this will fill in every prior film by the talent within the right set of dates

gen priortotgross=0

forvalues l=1/8 {
replace priortotgross=priortotgross+loggedadjgross[_n-`l'] if talent==talent[_n-`l'] & (ndate-365*3)<ndate[_n-`l']
}

If this loop works correctly, it will go back and look at the last 8 data points and add the gross together, if the talent is the same as the prior data point and the date is close enough. One potential problem is the (ndate-365*3) code. That's a guess. You may want to create a lagged variable ndate_l3=ndate-365*3 if that fails. (Create it as step 3b.)

A good way to test this code is to look for someone who has multiple films but more than a 3 year gap. If the code works, their priortotgross should be zero. If it's a number greater than 0, the ndate code failed. Also, everyone's first film should automatically be set to zero.

I assumed no one has more than 8 films in three years. If that is wrong, change the 8 in 1/8 to a more appropriate number. That's a lowercase l for the macro, as in lag.

5) Obviously, the first loop is getting total gross, not an average. So we need another loop to count films.

gen films_l3=0

forvalues l=1/8 {
replace films_l3=films_l3+filmdummy[_n-`l'] if talent==talent[_n-`l'] & (ndate-365*3)<ndate[_n-`l']
}

6) Get an average

gen avegross_l3=priortotgross/films_l3

Let me know if this works!

Best,
Noah
Comment
Noah Grand

Join Date: Jul 2015

Posts: 2
#5

16 Jul 2015, 19:57

Guh, the tabs in the "replace" lines didn't copy paste for clarity. Sigh.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4438
#6

16 Jul 2015, 19:58

use the "code" blocks - read the FAQ to find out what I mean and how to do it
Comment
Pat Reilly

Join Date: Jul 2015

Posts: 3
#7

16 Jul 2015, 23:05

Thanks, Noah! It worked like a champ!
Comment
Max Kruitwagen

Join Date: Jul 2015

Posts: 12
#8

23 Jul 2015, 09:06

Hi Noah,

As I was reading your post today I found it really helpful, thanks a lot! I do however have a kind of follow up question. Besides the average I want to know the median as well and to be honest, I have no idea how to do this. Could you or anyone else help me on this?

Best,
Max
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

23 Jul 2015, 09:46

To generalize Noah's approach, you can loop until no more observations fall within the time period. You can also create variables to hold the lagged values which make it possible to use egen row functions to calculate summary statistics.

Code:

* please use -dataex- (from SSC) to generate data examples that can be 
* worked with
clear
input str68 title str15 talent double loggedadjgross float fdate
"A Fine Mess (1986)" "Adams, Tony (I)" 15.324546 9716 
"That's Life! (1986)" "Adams, Tony (I)" 14.9339 9749 
"Blind Date (1987)" "Adams, Tony (I)" 17.172577 9944 
"Sunset (1988)" "Adams, Tony (I)" 14.983685 10346 
"Skin Deep (1989)" "Adams, Tony (I)" 16.379337 10654 
"Switch (1991)" "Adams, Tony (I)" 16.048485 11452 
"Son of the Pink Panther (1993)" "Adams, Tony (I)" 14.152566 12292 
"Shrek (2001)" "Adamson, Andrew" 18.650175 15087 
"Shrek 2 (2004)" "Adamson, Andrew" 19.073252 16206 
"The Chronicles of Narnia: The Lion the Witch and the Wardrobe (2005)" "Adamson, Andrew" 18.62377 16777 
"Shrek the Third (2007)" "Adamson, Andrew" 18.669746 17292 
"Ballast (2008)" "Adamson, Andrew" 10.324732 17550 
"The Chronicles of Narnia: Prince Caspian(2008)" "Adamson, Andrew" 17.801085 17661 
"Shrek Forever After (2010)" "Adamson, Andrew" 18.321758 18373 
"The Chronicles of Narnia: The Voyage of the Dawn Treader (2010)" "Adamson, Andrew" 17.495998 18598 
"Puss in Boots (2011)" "Adamson, Andrew" 17.826778 18923 
"Cirque du Soleil: Worlds Away (2012)" "Adamson, Andrew" 15.319257 19286 
end
format %td fdate

* keep generating lagged version until no new values are found
* within the desired time window
sort talent fdate loggedadjgross title
local more 1
local i 0
local vtype : type loggedadjgross
local window = 365 * 3
while `more' {
    by talent: gen `vtype' pv_`++i' = loggedadjgross[_n-`i'] ///
        if (fdate - fdate[_n-`i']) < `window'
    qui count if !mi(pv_`i')
    local more = r(N)
}

* use the egen row functions to generate statistics
egen nvals = rownonmiss(pv_*)
egen totgross = rowtotal(pv_*)
gen mgross1 = totgross / nvals
egen mgross2 = rowmean(pv_*)
egen medgross = rowmedian(pv_*)

list talent-fdate m*, sepby(talent) noobs

Comment

Max Kruitwagen

Join Date: Jul 2015

Posts: 12
#10

26 Jul 2015, 03:34

Hi Robert,

Thanks a lot for your respons, it works perfectly for the problem described by Pat. I understand most of what you're doing but unfortunately my knowledge of Stata is not sufficient to understand all code you've written. And since my case is similar to Pat's but not exactly the same I'm running into to a small problem. What I want is a time window which is 3 years before the observation and 3 years afterwards. I've tried to add this after your date requirement:

Code:

| (DateAnn[_n+`i'] - DateAnn) < `window'

Normally I would use something like +f for a forward lag but this doesn't work and the above doesn't work either.

Further, I want the current observation to count as well. So instead or prior to the release date, I want to include the current release date as well. I've tried (it doesn't work):

Code:

by tsic: gen `vtype' pv_`++i' = FSPUebitdam[_n-`i'] + FSPUebitdam///

Since I don't fully understand the steps you are doing I don't know how to solve this. I hope someone can help me.

Best,

Max

Edit: I'm thinking about adding another requirement. So the assets of the previous observations have to be within a range(window) of (- or +)1.5x the assets of the current observation. How can I add another requirement?

Last edited by Max Kruitwagen; 26 Jul 2015, 04:15.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#11

26 Jul 2015, 09:52

Since your problem is different from the one in this thread, why don't you start a new thread and include a data example (preferably created using dataex (from SSC) that is representative of your data. If would also be useful to know the size of your data, both in terms of the number of panels and number observations.
Comment
Max Kruitwagen

Join Date: Jul 2015

Posts: 12
#12

27 Jul 2015, 08:31

Thanks for your help Robert, I started a new threat.
Comment

Announcement

Calculating a mean of previous values within a time period

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment