Hi,
I'm using rolling in Stata 14.1 on a Windows 7 machine with 32GB of RAM. I have an unbalanced panel data set of mutual funds. There are 1,037,398 rows and 15 columns with various characteristics of the funds in the data set, including returns. I want to calculate the rolling average, standard deviation, skew, and kurtosis of returns by mutual fund for the past 12-, 24-, and 36-months.
The program I wrote works as expected and finishes in a more or less reasonable amount of time for subsamples of the data. For example, for the 12-month horizon it takes 25 second to run on a subsample of 10,000 observations and 23 minutes on a subsample of 100,000 observations. But the program ran for over 15 hours without completing when I tried it on the full sample of 12-month returns. I've tried a few things based on discussions in this forum, such as using the command asrol that was written by another user. Doing so on the subsamples of my data set causes Stata to freeze. I also understand that the set memory option has been deprecated in recent versions of Stata. (I've used earlier versions of Stata in my research several years ago.)
I'm happy to post a sample data set, but I'm not sure it would be helpful because I know the program works on the subsamples. The problem seems to be related to how memory is allocated in Stata. The code fragment below is the one that calculates the moments of returns. I declare the data set to be a panel, drop the irrelevant variables, and then use rolling in the standard way to compute the statistics. I don't see why it takes so long on the full data set.
Thank you.
Ron
so fundid date_ym
tsset fundid date_ym
* net or gross returns
gen ret = ret_net
*gen ret = ret_gross
*keep fund_id ret date ticker
keep fundid ret date_ym fund_id
drop if ret>=.
save temp, replace
timer clear 1
timer on 1
rolling avg12=r(mean) sd12=r(sd) skew12=r(skewness) kurt12=r(kurtosis), window(12) clear nodots: summarize ret, detail
label var avg12 "Average 12 month return (ending current month)"
label var sd12 "St.dev. of 12 month return (ending current month)"
label var skew12 "Skewness of 12 month return (ending current month)"
label var kurt12 "Kurtosis of 12 month return (ending current month)"
gen date = end
format date %tm
save rolling_output12, replace
timer off 1
timer list 1
I'm using rolling in Stata 14.1 on a Windows 7 machine with 32GB of RAM. I have an unbalanced panel data set of mutual funds. There are 1,037,398 rows and 15 columns with various characteristics of the funds in the data set, including returns. I want to calculate the rolling average, standard deviation, skew, and kurtosis of returns by mutual fund for the past 12-, 24-, and 36-months.
The program I wrote works as expected and finishes in a more or less reasonable amount of time for subsamples of the data. For example, for the 12-month horizon it takes 25 second to run on a subsample of 10,000 observations and 23 minutes on a subsample of 100,000 observations. But the program ran for over 15 hours without completing when I tried it on the full sample of 12-month returns. I've tried a few things based on discussions in this forum, such as using the command asrol that was written by another user. Doing so on the subsamples of my data set causes Stata to freeze. I also understand that the set memory option has been deprecated in recent versions of Stata. (I've used earlier versions of Stata in my research several years ago.)
I'm happy to post a sample data set, but I'm not sure it would be helpful because I know the program works on the subsamples. The problem seems to be related to how memory is allocated in Stata. The code fragment below is the one that calculates the moments of returns. I declare the data set to be a panel, drop the irrelevant variables, and then use rolling in the standard way to compute the statistics. I don't see why it takes so long on the full data set.
Thank you.
Ron
so fundid date_ym
tsset fundid date_ym
* net or gross returns
gen ret = ret_net
*gen ret = ret_gross
*keep fund_id ret date ticker
keep fundid ret date_ym fund_id
drop if ret>=.
save temp, replace
timer clear 1
timer on 1
rolling avg12=r(mean) sd12=r(sd) skew12=r(skewness) kurt12=r(kurtosis), window(12) clear nodots: summarize ret, detail
label var avg12 "Average 12 month return (ending current month)"
label var sd12 "St.dev. of 12 month return (ending current month)"
label var skew12 "Skewness of 12 month return (ending current month)"
label var kurt12 "Kurtosis of 12 month return (ending current month)"
gen date = end
format date %tm
save rolling_output12, replace
timer off 1
timer list 1
Comment