Rolling standard dev

Bob Ferguson

Join Date: Jul 2019

Posts: 23
#1

Rolling standard dev

31 Aug 2019, 13:17

I have an unbalanced panel dataset with many (15000+) firms and their daily stock returns from 1960-2018.
From these daily stock returns I want to calculate the one-year "rolling standard deviation" (according to the paper I'm following).
I have managed to come up with the command using -rangestat- and I'm just wondering if I'm doing it correctly as there is no other way for me to check whether the resulting standard deviations from rangestat are correct.

I have also noticed a couple of things while using -rangestat-:

I use the count option to see how many observations it uses and while sometimes it uses 1 observation (this is fine), in other times it uses 2 observations which I'm not sure why it's doing that since I want to use one day returns (the previous day).

Also the standard deviation is sometimes 0 and other times " . " (missing value), I wonder why this is and how I should interpret the difference between these 2 in the output of the rangestat command?

I will post below an example of the data I'm using, but since there is so many observations I cannot post a completely representative sample here. So I will just go with the returns of 1 company (Apple):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long date str8 ticker str36 comnam double permco float Returns 8417 "AAPL" "APPLE COMPUTER INC" 7 .03319269 8418 "AAPL" "APPLE COMPUTER INC" 7 -.022553137 8419 "AAPL" "APPLE COMPUTER INC" 7 .007132721 8420 "AAPL" "APPLE COMPUTER INC" 7 .1054022 8421 "AAPL" "APPLE COMPUTER INC" 7 -.00033 8424 "AAPL" "APPLE COMPUTER INC" 7 -.0588665 8425 "AAPL" "APPLE COMPUTER INC" 7 .03793554 8426 "AAPL" "APPLE COMPUTER INC" 7 .03980917 8427 "AAPL" "APPLE COMPUTER INC" 7 .06625561 8428 "AAPL" "APPLE COMPUTER INC" 7 .005786227 8431 "AAPL" "APPLE COMPUTER INC" 7 -.0033834374 8432 "AAPL" "APPLE COMPUTER INC" 7 .02086082 8433 "AAPL" "APPLE COMPUTER INC" 7 .026269455 8434 "AAPL" "APPLE COMPUTER INC" 7 .03968534 8435 "AAPL" "APPLE COMPUTER INC" 7 -.014424606 8438 "AAPL" "APPLE COMPUTER INC" 7 -.04090528 8439 "AAPL" "APPLE COMPUTER INC" 7 -.009235363 end format %d date

note: permco is simply a unique identifier that each company in the dataset has, a unique number per company.

I then use the following command:

rangestat (sd) Returns (count) Returns, interval (date -1 0) by (permco)

Thanks in advance
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#2

31 Aug 2019, 14:21

No, your code is wrong. Your date variable is in days, and by specifying -interval(date -1 0)- you are getting a rolling standard deviation with a trailing window of 2 days, not 1 year. This is why your count of observations will be either 1 or 2: you don't have a year's worth of data included in the window: just the index day and the day before. But there are also gaps in the dates (which look to me like weekends), so sometimes the day before is not in the data, so you get only 1 observation included. When there is only one observation, the standard deviation cannot be calculated, hence the missing values whenever the count is 1.

So that explains why you are getting these strange results. The question is what to do about it. The answer to that is to change the -interval- option to -interval(date -364 0)- to get a 365 day trailing window (less in reality due to weekends and perhaps holidays as well).
Comment
Bob Ferguson

Join Date: Jul 2019

Posts: 23
#3

31 Aug 2019, 14:45

Thank you very much.
This indeed sounds more logical, the confusion for me came from the journal article that I'm reading for my analysis where they state: "We use a one-year rolling standard deviation for volatilities (of returns)" but then in the same paragraph mention that "We use one-day returns to estimate volatilities".
I think I get it now however, thanks a lot!

By the way: the reason why there's gaps in the dates is that this whole dataset contains only trading days (weekdays) and thus excludes weekends.
In Finance it's assumed that there are about 250 trading days in a year. Would I in that case be better off using -interval (date -249 0)- in order to get the effect of 1 calendar year?

Last edited by Bob Ferguson; 31 Aug 2019, 14:56.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#4

31 Aug 2019, 15:25

The answer is the same as Clyde has given you in #2: the interval() option specifies the length of the interval in terms of the variable given, not the number of non-missing values the interval includes in your data. To get a year’s worth you need length 365 or 366. If your interval is just 250 days long you will catch proportionally fewer trading days, as experiment should confirm.
Comment

Attaullah Shah

Join Date: Aug 2014
Posts: 1667

01 Sep 2019, 04:24

"We use a one-year rolling standard deviation for volatilities (of returns)" but then in the same paragraph mention that "We use one-day returns to estimate volatilities".

The text gives me a feeling that the measure is annual, based on daily data. You should confirm that from the rest of the discussion in the paper whether other variables are of annual frequency. If so, then please see the following example where I first generate daily standard deviation and then convert it to annual frequency. I am using asrol that is availale on SSC.

Code:

ssc install asrol
bys ticker : asrol Returns, gen(daily_sd) window(date 360) stat(sd)
gen annual_sd = daily_sd * sqrt(252)
list in 1/10

     +-------------------------------------------------------------------------------------+
     |      date   ticker               comnam   permco     Returns    daily_sd   annual~d |
     |-------------------------------------------------------------------------------------|
  1. | 17jan1983     AAPL   APPLE COMPUTER INC        7    .0331927           .          . |
  2. | 18jan1983     AAPL   APPLE COMPUTER INC        7   -.0225531   .03941825   .6257454 |
  3. | 19jan1983     AAPL   APPLE COMPUTER INC        7    .0071327   .02789256   .4427807 |
  4. | 20jan1983     AAPL   APPLE COMPUTER INC        7    .1054022     .054705   .8684149 |
  5. | 21jan1983     AAPL   APPLE COMPUTER INC        7     -.00033   .04937827   .7838557 |
     |-------------------------------------------------------------------------------------|
  6. | 24jan1983     AAPL   APPLE COMPUTER INC        7   -.0588665   .05577468   .8853956 |
  7. | 25jan1983     AAPL   APPLE COMPUTER INC        7    .0379355   .05194807     .82465 |
  8. | 26jan1983     AAPL   APPLE COMPUTER INC        7    .0398092   .04891608   .7765188 |
  9. | 27jan1983     AAPL   APPLE COMPUTER INC        7    .0662556   .04853329   .7704422 |
 10. | 28jan1983     AAPL   APPLE COMPUTER INC        7    .0057862   .04608435   .7315664 |
     +-------------------------------------------------------------------------------------+

I have used the window length equal to 360 days, assuming that the date variable is actual date with missing trading days. Though we have 252 trading days, there are missing days and hence we need to specify full 360-days to get one year range. In the second step when converting the daily sd to yearly, we need to use the 252 trading days convention.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

Comment

Bob Ferguson

Join Date: Jul 2019

Posts: 23
#6

01 Sep 2019, 13:42

Nick Cox and Attaullah Shah thank you both very much for your comments, this makes things much more clear.

Originally posted by Attaullah Shah View Post

The text gives me a feeling that the measure is annual, based on daily data. You should confirm that from the rest of the discussion in the paper whether other variables are of annual frequency. If so, then please see the following example where I first generate daily standard deviation and then convert it to annual frequency. I am using asrol that is availale on SSC.

Code:

ssc install asrol bys ticker : asrol Returns, gen(daily_sd) window(date 360) stat(sd) gen annual_sd = daily_sd * sqrt(252) list in 1/10 +-------------------------------------------------------------------------------------+ | date ticker comnam permco Returns daily_sd annual~d | |-------------------------------------------------------------------------------------| 1. | 17jan1983 AAPL APPLE COMPUTER INC 7 .0331927 . . | 2. | 18jan1983 AAPL APPLE COMPUTER INC 7 -.0225531 .03941825 .6257454 | 3. | 19jan1983 AAPL APPLE COMPUTER INC 7 .0071327 .02789256 .4427807 | 4. | 20jan1983 AAPL APPLE COMPUTER INC 7 .1054022 .054705 .8684149 | 5. | 21jan1983 AAPL APPLE COMPUTER INC 7 -.00033 .04937827 .7838557 | |-------------------------------------------------------------------------------------| 6. | 24jan1983 AAPL APPLE COMPUTER INC 7 -.0588665 .05577468 .8853956 | 7. | 25jan1983 AAPL APPLE COMPUTER INC 7 .0379355 .05194807 .82465 | 8. | 26jan1983 AAPL APPLE COMPUTER INC 7 .0398092 .04891608 .7765188 | 9. | 27jan1983 AAPL APPLE COMPUTER INC 7 .0662556 .04853329 .7704422 | 10. | 28jan1983 AAPL APPLE COMPUTER INC 7 .0057862 .04608435 .7315664 | +-------------------------------------------------------------------------------------+

I have used the window length equal to 360 days, assuming that the date variable is actual date with missing trading days. Though we have 252 trading days, there are missing days and hence we need to specify full 360-days to get one year range. In the second step when converting the daily sd to yearly, we need to use the 252 trading days convention.

I'm going to try asrol tomorrow as well and compare the results. By the way, the text or paper that the above piece is from is the Betting against Beta paper by Andrea Frazzini regarding the formation of beta sorted portfolios for factor investing.

Last edited by Bob Ferguson; 01 Sep 2019, 13:45.
Comment
Bob Ferguson

Join Date: Jul 2019

Posts: 23
#7

02 Sep 2019, 09:56

I just tried out asrol and rangestat seperately, and luckily they give the exact same daily standard deviations (I checked with the "correlation" command).

With rangestat i used the following command:

Code:

rangestat (sd) Returns (count) Returns, interval (date -364 0) by (permco)

With asrol I used the following command:

Code:

bysort permco: asrol Returns, gen(daily_sd) window (date 365) stat (sd)

The problem is however with the number of observations used, I include the count option to keep count of number of observations used in the -rangestat- or -asrol- command and it comes up with some peculiar numbers.
If I specify the window to be (-364 0) or in the case of asrol just (365), the count variable (in the case of using -rangestat-) will have a mean of 228 with Minimum value of 0 and Maximum value of 1155!
I wonder how this is possible, that even if I specify the interval of -rangestat- or -asrol- to be 365 days, it will sometimes use more than 365 days (even up to more than 1000 days). I do understand that sometimes it will obviously use less than 365 because there is not enough data available.

Last edited by Bob Ferguson; 02 Sep 2019, 10:00.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#8

02 Sep 2019, 10:46

This sounds to me like there is a serious problem with your data, whereby you have multiple observations with the same permco and date. Remember that the -interval()- option in -rangestat- and the -window()- option in -asrol- define the rolling window based on the value of the date variable. It does not count the observations to be included; it includes all observations whose date falls within the specified range, and if you have surplus observations for the same date all of those will be included. You can check this out simply:

Code:

isid permco date

If your data are correct, this command will produce no output at all. If you have duplicate observations this command will produce an error message informing you that permco and date do not uniquely identify the observations in the data. In that case, the next step is to identify those duplicate observations:

Code:

duplicates tag permco date, gen(flag) browse if flag

Then you have to figure out why those surplus observations are there and how to fix it. I should point out as well that the presence of surplus observations like this indicates a problem with the data management (or source data) that led to your working data set. Where there is one error, others often lurk as well. So in addition to fixing this problem, it would be prudent for you to carefully review the entire chain of steps that led to your working data set and scour it for other problems. Better to find the problems now than when you get erroneous results that somebody acts on, resulting in harm.
1 like
Comment
Ramy Othman

Join Date: Jan 2022

Posts: 1
#9

10 Jan 2022, 20:18

Hello everyone,
I might be repeating the question here but I seem to not have understood the solution
I am trying to generate a rolling standard deviation for GDP growth. Since I am using a time series that is relatively small (36 observations), I am trying to minimize the interval in order to avoid loosing observations. I am currently using quarterly data designated by the word (date). The question is:
Can I use the command : (GDPgr = GDP growth rate)
rangestat (count) GDPgr (sd) GDPgr, interval (date -1 0)
To generate a Moving Standard Deviation variable that can be used as a Proxy for Volatility ?
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1667
#10

11 Jan 2022, 02:46

Ramy Othman,
Welcome to Statalist. How your data looks like? You can post an data example using dataex command as shown below

Code:

ssc install dataex dataex

If your data has a date variable, then a quarter will have back window of 90 days. Are doing your calculation in a single country? If yes, then

Code:

ssc install asrol asrol GDPgr , gen(qsd_sd) window(date 90) stat(sd)

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#11

11 Jan 2022, 02:52

This is almost the same question as #1. If date is a quarterly date variable, each SD is of at most 2 values as your count result should make clear. That is unlikely to be interesting or useful.

If the data are really panel data you probably need a by() option call as well.

Please follow https://www.statalist.org/forums/help#stata and use dataex to give a data example if this does not make matters clear. That is, type

Code:

dataex date GDPgr

and show us the results. If your data are panel data, show a country variable as well, or its equivalent.
Comment

Announcement