Calculating Downside Beta with daily returns

Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#1

Calculating Downside Beta with daily returns

16 Jun 2017, 03:30

Dear Stata community,

I am desperate for help regarding Stata, since I am a total beginner and not sure how I am supposed to handle this problem.

I am currently working on a project where I have to calculate the downside Beta of around 2,010 companies. I have the daily stock returns for every company between the years 2007 and 2016. Furthermore, I have the S&P 500 Composite Index's daily returns for the time range as well. Calculating a downside Beta basically means that I calculate the firm's Beta in times when the market index (S&P 500) performs below a certain benchmark (in this case 0). My dataset is set up as follows:

Date Ticker Company Stock Return S&P 500 Return

YYYYMMDD AAA X Y

YYYYMMDD AAA X Y

YYYYMMDD AAA X Y

...

YYYYMMDD AAA X Y

YYYYMMDD BBB X Y

YYYYMMDD BBB X Y

YYYYMMDD BBB X Y

...

This dataset now only includes the dates where the S&P 500 Return has performed below the benchmark (0). Basically, the dataset can be compared to panel data, whereby the time range is repeated for every company. Since the dataset has already been sorted for the 'downside' situations, I basically have to conduct a normal Beta calculation now for every company. However, what I need is the Beta for each company in each year (2007-2016). The formula that I need to utilise for the Beta calculation is: B = cov(ri,rm)/var(rm) whereby ri = asset i's return (I am using daily returns), rm = market return (the S&P 500 daily return), cov = covariance, and var = variance.

I would be incredibly grateful for any help regarding this matter and would be very happy about an answer from you.

Cheers,

Konstantin
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#2

16 Jun 2017, 11:16

It would have been more helpful had you posted an example of your actual data, rather than this schematic. I'll assume that Date is a Stata internal format numeric date, that ticker is a string variable, and that company_stock_return and sp_500_return are the names of your two return variables. Then you loop over firms, and loop over years within that, placing the regression coefficient in a variable beta as you go:

Code:

gen int year = yofd(Date) gen beta = . levelsof ticker, local(firms) foreach f of local firms { levelsof year if ticker == `"`f'"', local(years) foreach y of local years { regress company_stock_return sp_500_return if year == `y' & ticker ==` "`f'"' replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"' } }

Note: Not tested; beware of typos.
Comment

Konstantin Schmeisser

Join Date: Jun 2017
Posts: 12

16 Jun 2017, 13:37

Thank you very much for the response!

The dataset displayed above is actually not integrated in Stata yet, it is an Excel file. The actual data set has over a million rows, however, this is what part of it (for one firm) looks like:

20070103	XRIT	0,011382	-0,001199
20070105	XRIT	-0,055112	-0,006085
20070109	XRIT	-0,050209	-0,000517
20070117	XRIT	-0,025043	-0,000894
20070118	XRIT	-0,007086	-0,002971
20070122	XRIT	0,008152	-0,005278
20070125	XRIT	-0,008993	-0,01127
20070126	XRIT	0,009074	-0,001208
20070129	XRIT	-0,004496	-0,001097
20070205	XRIT	-0,001776	-0,000967
20070208	XRIT	0,025506	-0,001179
20070209	XRIT	0,004288	-0,007077
20070212	XRIT	0,002562	-0,003261

Format of date is again YYYYMMDD. Does this make it clearer? Also, is it sufficient to simply regress the company stock return with the S&P 500 return? From the formula that I am supposed to use, the Beta is calculated with the covariance and variance of the returns (see original post please)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#4

16 Jun 2017, 14:03

So, it is not clear how the date variable will import into Stata. So before contemplating changing the code shown in #2 you need to create a Stata data set from your spreadsheet and see how that works out. If it comes in as some kind of string variable, or a numeric variable that is not a Stata date variable, then the code in #2 will need to be modified to create a Stata internal format date variable from that. Other than that, the code should work as is.

The formula you give for beta in #1 is equivalent to the ordinary regression coefficient; it's a simple algebraic derivation to show that.
Comment
Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#5

17 Jun 2017, 01:49

Thank you very very much for your help!!!
Comment
Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#6

17 Jun 2017, 05:15

Just to make sure: I imported the necessary data, however, I think that the date variable is now already in Stata format. This is what Stata gives me for the variables (see screenshot)

ri is company return and rm is market return

Attached Files
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

17 Jun 2017, 05:30

No it is not a Stata Internal Format date, it was read simply as a number. You have some work ahead of you to convert it to a Stata Internal Format date.

Before working with dates and times, any Stata user should thoroughly review the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF. After that, the help datetime documentation will usually be enough to point the way. All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

And do please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. In particular, it is never a good idea to post pictures of your output. To present data, code, and results readably, please copy them from the Results window or elsewhere and paste them into a code block in the Forum editor, as explained in the Statalist FAQ. For example, the following:

[code]
// sample code
sysuse auto, clear
describe
[/code]

will be presented in the post as the following:

Code:

// sample code sysuse auto, clear describe

Finally, note that you could have included all four variable names on a single describe command, producing a shorter output to copy and paste.
Comment
Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#8

17 Jun 2017, 09:47

Thank you very much for the reply. I was now able to get first results. Since my data set as whole was too large, I split it up into separate ones (which each contain over 400,000 rows). Now I run into the problem of an error message:

insufficient observations r(2001); Do you know how I could get rid of this problem? I used the code from #2

It has worked for some other datasets with less observations, so I don't really understand why it doesn't work in the other ones

Last edited by Konstantin Schmeisser; 17 Jun 2017, 09:57.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#9

17 Jun 2017, 10:06

insufficient observations r(2001); Do you know how I could get rid of this problem? I used the code from #2

Remember that in any regression command, any observation having a missing value for any of the regression variables is excluded from estimation. This message implies that at some point, the code called upon Stata to do a regression in which, after eliminating observations with missing values, the number of observations remaining was too small to carry out the regression. Since the regression in this case involves only a single outcome and a single predictor, the minimum number of observations required to avoid this message is 2. So there are some combinations of firm and year for which only zero or one observations remain after excluding any observation with a messing value for either return variable.

At this point the question bifurcates. Does this mean there is something wrong with your data? If your data set shouldn't even contain any observations with missing values, or if they should be rare, then you may have a corrupted or otherwise incorrect data set that needs to be fixed. If it sounds like this is the case, then the first step is to identify those combinations of ticker and year that lead to too few observations:

Code:

by ticker year, sort: egen valid_obs = total(!missing(company_stock_return, sp_500_return)) browse if valid_obs <= 1

From there, you have to figure out why there are so few usable observations for these firm-year combinations and then figure out how to fix the problem.

The other possibility is that the missing values are expected and do not signal any problem with the data set. In that case, the code needs to be modified to skip over those regressions with too few observations. The following modification will do that:

Code:

gen int year = yofd(Date) gen beta = . levelsof ticker, local(firms) foreach f of local firms { levelsof year if ticker == `"`f'"', local(years) foreach y of local years { capture noisily regress company_stock_return sp_500_return if year == `y' /// & ticker ==` "`f'"' if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"' } else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS display in red `"Unexpected error with ticker = `f' and year = `y'"' exit c(rc) } } }

The -capture- prefix will allow Stata to proceed even if the regression fails. Do read the manual section on -capture- and learn what it does and what c(rc) is about. In this case, if the regression proceeds normally (in which case c(rc) == 0), we go ahead and store the beta result. If the regression fails because of no observations (c(rc) = 2000)) or one observation (c(rc) = 2001)), neither branch of the -if...else if- construct is taken and Stata just moves on to the next iteration of the loop. And if something else goes wrong that we did not anticipate, Stata halts with an error message to allow you to investigate the problem and fix it before trying again.
Comment
Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#10

17 Jun 2017, 11:31

Thank you very much for the help Mr. Schechter. I now entered the code in the following way:

Code:

gen int year = yofd(Date) gen beta = . levelsof ticker, local(firms) foreach f of local firms { levelsof year if ticker == `"`f'"', local(years) foreach y of local years { capture noisily regress company_stock_return sp_500_return if year == `y' /// & ticker == `"`f'"' if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"' } else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS display in red `"Unexpected error with ticker = `f' and year = `y'"' exit c(rc) } } }

I marked some sections red where I am not sure. When I entered the & as in #9, it was shown as an invalid command, which is why I moved it to the previous line. Furthermore, I believe there was a space missing in the code at #9 for the `"`"f' " ' section? Also, does it matter where the } (breaks) are set in the code? Since they seem to be a bit out of line.

The error message that it gives me now is simply:

2011/ invalid name
Unexpected error with ticker = AEGN and year = 2011
r(198);

I checked this section in the data and there is nothing wrong with it. AEGN is alphabetically the first ticker in the dataset, meaning that the regression already stops right at the beginning (The data for AEGN starts with year 2011).

I believe there is something wrong in the way I enter the formula? More specifically, the syntax?

Thank you in advance for any help/ suggestions.

Last edited by Konstantin Schmeisser; 17 Jun 2017, 11:35.
Comment
Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#11

17 Jun 2017, 12:13

On the other hand, it would also be possible to simply remove the invalid observations from the file? The error that occured in #8 was due to some invalid observations. After using code in #9 I could identify these invalid observations. I guess it would be simplest to remove these and then use the code again from #2?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#12

17 Jun 2017, 12:23

Somehow there is a typo in the code in #9 that is not present in the do-file I copy/pasted from. I must have hit a stray key somewhere along the way. Anyhow, the problem is that where, in #9, it says ` "f"', there should be no space between ` and ". Moving that line to the previous line, placing it after /// simply nullifies it as code: anything that appears following /// is a comment, and is not executed.

So here's how it should be:

Code:

gen int year = yofd(Date) gen beta = . levelsof ticker, local(firms) foreach f of local firms { levelsof year if ticker == `"`f'"', local(years) foreach y of local years { capture noisily regress company_stock_return sp_500_return if year == `y' /// & ticker ==`"`f'"' if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"' } else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS display in red `"Unexpected error with ticker = `f' and year = `y'"' exit c(rc) } } }

Important: because I used the /// to extend the code to a new line, this code cannot be run from the command window. It must be copied to a do-file and run from there.

The alignment of the curly braces does not matter from the perspective of Stata execution, but it is important for human readability and maintainability of the code. This sometimes seems to happen when code is copy/pasted from the Forum editor to a Stata do-file. I don't know why, and it only happens intermittently. Anyway, it is best to fix it up, though not critical to functioning of the code.

As you do not provide an example data set, I cannot test this code, so I cannot assure that it does not contain other typos, but I believe it is correct and should only return error messages when there is something actually wrong.

Added: Crossed with #11
Comment

Konstantin Schmeisser

Join Date: Jun 2017
Posts: 12

#13

17 Jun 2017, 13:18

It worked!!! Thank you so much! This is how the output looks like for one company

Code:

 date
ticker
ri
rm
edatevar
year
beta

20111031
AEGN
-.050096
-.024738
31oct2011
2011
0,158551

20111101
AEGN
-.02975
-.027942
1-nov-11
2011
0,158551

20111104
AEGN
-.003151
-.00628
4-nov-11
2011
0,158551

20111109
AEGN
-.065137
-.036695
9-nov-11
2011
0,158551

20111114
AEGN
-.033715
-.00955
14-nov-11
2011
0,158551

20111116
AEGN
-.013436
-.016616
16-nov-11
2011
0,158551

20111117
AEGN
-.007782
-.0168
17-nov-11
2011
0,158551

20111118
AEGN
.017647
-.000395
18-nov-11
2011
0,158551

20111121
AEGN
-.024406
-.018648
21-nov-11
2011
0,158551

20111122
AEGN
-.056616
-.004141
22-nov-11
2011
0,158551

20111123
AEGN
-.045359
-.022095
23-nov-11
2011
0,158551

20111125
AEGN
-.027047
-.002686
25-nov-11
2011
0,158551

20111201
AEGN
-.00066
-.001909
1-dec-11
2011
0,158551

20111202
AEGN
.007261
-.000241
2-dec-11
2011
0,158551

20111208
AEGN
-.046846
-.021142
8-dec-11
2011
0,158551

20111212
AEGN
-.014502
-.014914
12-dec-11
2011
0,158551

where ri is the company return and rm the market return. As we can see, the beta remains the same because we are looking at the same year. According to the formula in #8, this should be the beta for the year 2011, based on the daily returns in that year, correct?

Again, thank you very much for all the support!

P.S. I had to change the "date' variable into an internal Stata date format variable, which is now "edatevar"

Last edited by Konstantin Schmeisser; 17 Jun 2017, 13:20.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#14

17 Jun 2017, 13:31

If what you show in #13 is the entire data for ticker AEGN in year 2011, then something has gone wrong. When I regress ri against rm in this data example, I get beta = 1.436238, not 0.158551. I note, however, that in the example, the dates span only the last two months of 2011, so I'm hoping that this is just a subset of the full AEGN 2011 data and that 0.158551 would be the correct regression coefficient were the full AEGN 2011 data shown.
Comment
Konstantin Schmeisser

Join Date: Jun 2017

Posts: 12
#15

17 Jun 2017, 13:55

Yes, this is just part of the AEGN 2011 data!
Comment

Date	Ticker	Company Stock Return	S&P 500 Return
YYYYMMDD	AAA	X	Y
YYYYMMDD	AAA	X	Y
YYYYMMDD	AAA	X	Y
...
YYYYMMDD	AAA	X	Y
YYYYMMDD	BBB	X	Y
YYYYMMDD	BBB	X	Y
YYYYMMDD	BBB	X	Y
...

Announcement