Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Downside Beta with daily returns

    Dear Stata community,

    I am desperate for help regarding Stata, since I am a total beginner and not sure how I am supposed to handle this problem.

    I am currently working on a project where I have to calculate the downside Beta of around 2,010 companies. I have the daily stock returns for every company between the years 2007 and 2016. Furthermore, I have the S&P 500 Composite Index's daily returns for the time range as well. Calculating a downside Beta basically means that I calculate the firm's Beta in times when the market index (S&P 500) performs below a certain benchmark (in this case 0). My dataset is set up as follows:
    Date Ticker Company Stock Return S&P 500 Return
    YYYYMMDD AAA X Y
    YYYYMMDD AAA X Y
    YYYYMMDD AAA X Y
    ...
    YYYYMMDD AAA X Y
    YYYYMMDD BBB X Y
    YYYYMMDD BBB X Y
    YYYYMMDD BBB X Y
    ...
    This dataset now only includes the dates where the S&P 500 Return has performed below the benchmark (0). Basically, the dataset can be compared to panel data, whereby the time range is repeated for every company. Since the dataset has already been sorted for the 'downside' situations, I basically have to conduct a normal Beta calculation now for every company. However, what I need is the Beta for each company in each year (2007-2016). The formula that I need to utilise for the Beta calculation is: B = cov(ri,rm)/var(rm) whereby ri = asset i's return (I am using daily returns), rm = market return (the S&P 500 daily return), cov = covariance, and var = variance.

    I would be incredibly grateful for any help regarding this matter and would be very happy about an answer from you.

    Cheers,

    Konstantin

  • #2
    It would have been more helpful had you posted an example of your actual data, rather than this schematic. I'll assume that Date is a Stata internal format numeric date, that ticker is a string variable, and that company_stock_return and sp_500_return are the names of your two return variables. Then you loop over firms, and loop over years within that, placing the regression coefficient in a variable beta as you go:

    Code:
    gen int year = yofd(Date)
    gen beta = .
    
    levelsof ticker, local(firms)
    foreach f of local firms {
        levelsof year if ticker == `"`f'"', local(years)
        foreach y of local years {
            regress company_stock_return sp_500_return if year == `y' & ticker ==` "`f'"'
            replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"'
        }
    }
    Note: Not tested; beware of typos.

    Comment


    • #3
      Thank you very much for the response!

      The dataset displayed above is actually not integrated in Stata yet, it is an Excel file. The actual data set has over a million rows, however, this is what part of it (for one firm) looks like:
      20070103 XRIT 0,011382 -0,001199
      20070105 XRIT -0,055112 -0,006085
      20070109 XRIT -0,050209 -0,000517
      20070117 XRIT -0,025043 -0,000894
      20070118 XRIT -0,007086 -0,002971
      20070122 XRIT 0,008152 -0,005278
      20070125 XRIT -0,008993 -0,01127
      20070126 XRIT 0,009074 -0,001208
      20070129 XRIT -0,004496 -0,001097
      20070205 XRIT -0,001776 -0,000967
      20070208 XRIT 0,025506 -0,001179
      20070209 XRIT 0,004288 -0,007077
      20070212 XRIT 0,002562 -0,003261
      Format of date is again YYYYMMDD. Does this make it clearer? Also, is it sufficient to simply regress the company stock return with the S&P 500 return? From the formula that I am supposed to use, the Beta is calculated with the covariance and variance of the returns (see original post please)

      Comment


      • #4
        So, it is not clear how the date variable will import into Stata. So before contemplating changing the code shown in #2 you need to create a Stata data set from your spreadsheet and see how that works out. If it comes in as some kind of string variable, or a numeric variable that is not a Stata date variable, then the code in #2 will need to be modified to create a Stata internal format date variable from that. Other than that, the code should work as is.

        The formula you give for beta in #1 is equivalent to the ordinary regression coefficient; it's a simple algebraic derivation to show that.

        Comment


        • #5
          Thank you very very much for your help!!!

          Comment


          • #6
            Just to make sure: I imported the necessary data, however, I think that the date variable is now already in Stata format. This is what Stata gives me for the variables (see screenshot)

            ri is company return and rm is market return

            Attached Files

            Comment


            • #7
              No it is not a Stata Internal Format date, it was read simply as a number. You have some work ahead of you to convert it to a Stata Internal Format date.

              Before working with dates and times, any Stata user should thoroughly review the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF. After that, the help datetime documentation will usually be enough to point the way. All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

              And do please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. In particular, it is never a good idea to post pictures of your output. To present data, code, and results readably, please copy them from the Results window or elsewhere and paste them into a code block in the Forum editor, as explained in the Statalist FAQ. For example, the following:

              [code]
              // sample code
              sysuse auto, clear
              describe
              [/code]

              will be presented in the post as the following:
              Code:
              // sample code
              sysuse auto, clear
              describe
              Finally, note that you could have included all four variable names on a single describe command, producing a shorter output to copy and paste.

              Comment


              • #8
                Thank you very much for the reply. I was now able to get first results. Since my data set as whole was too large, I split it up into separate ones (which each contain over 400,000 rows). Now I run into the problem of an error message:

                insufficient observations r(2001); Do you know how I could get rid of this problem? I used the code from #2

                It has worked for some other datasets with less observations, so I don't really understand why it doesn't work in the other ones
                Last edited by Konstantin Schmeisser; 17 Jun 2017, 09:57.

                Comment


                • #9
                  insufficient observations r(2001); Do you know how I could get rid of this problem? I used the code from #2
                  Remember that in any regression command, any observation having a missing value for any of the regression variables is excluded from estimation. This message implies that at some point, the code called upon Stata to do a regression in which, after eliminating observations with missing values, the number of observations remaining was too small to carry out the regression. Since the regression in this case involves only a single outcome and a single predictor, the minimum number of observations required to avoid this message is 2. So there are some combinations of firm and year for which only zero or one observations remain after excluding any observation with a messing value for either return variable.

                  At this point the question bifurcates. Does this mean there is something wrong with your data? If your data set shouldn't even contain any observations with missing values, or if they should be rare, then you may have a corrupted or otherwise incorrect data set that needs to be fixed. If it sounds like this is the case, then the first step is to identify those combinations of ticker and year that lead to too few observations:

                  Code:
                  by ticker year, sort: egen valid_obs = total(!missing(company_stock_return, sp_500_return))
                  browse if valid_obs <= 1
                  From there, you have to figure out why there are so few usable observations for these firm-year combinations and then figure out how to fix the problem.

                  The other possibility is that the missing values are expected and do not signal any problem with the data set. In that case, the code needs to be modified to skip over those regressions with too few observations. The following modification will do that:

                  Code:
                  gen int year = yofd(Date)
                  gen beta = .
                  
                  levelsof ticker, local(firms)
                  foreach f of local firms {
                      levelsof year if ticker == `"`f'"', local(years)
                      foreach y of local years {
                          capture noisily regress company_stock_return sp_500_return if year == `y' ///
                              & ticker ==` "`f'"'
                          if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS
                              replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"'
                          }
                          else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS
                              display in red `"Unexpected error with ticker = `f' and year = `y'"'
                              exit c(rc)
                          }
                      }
                  }
                  The -capture- prefix will allow Stata to proceed even if the regression fails. Do read the manual section on -capture- and learn what it does and what c(rc) is about. In this case, if the regression proceeds normally (in which case c(rc) == 0), we go ahead and store the beta result. If the regression fails because of no observations (c(rc) = 2000)) or one observation (c(rc) = 2001)), neither branch of the -if...else if- construct is taken and Stata just moves on to the next iteration of the loop. And if something else goes wrong that we did not anticipate, Stata halts with an error message to allow you to investigate the problem and fix it before trying again.

                  Comment


                  • #10
                    Thank you very much for the help Mr. Schechter. I now entered the code in the following way:

                    Code:
                    gen int year = yofd(Date)
                    gen beta = .  
                    
                    levelsof ticker, local(firms)
                    foreach f of local firms {    
                         levelsof year if ticker == `"`f'"', local(years)    
                         foreach y of local years {        
                             capture noisily regress company_stock_return sp_500_return if year == `y' /// & ticker == `"`f'"'        
                    if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS            
                             replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"'         }        
                    else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS            
                             display in red `"Unexpected error with ticker = `f' and year = `y'"'            
                             exit c(rc)         }     } }
                    I marked some sections red where I am not sure. When I entered the & as in #9, it was shown as an invalid command, which is why I moved it to the previous line. Furthermore, I believe there was a space missing in the code at #9 for the `"`"f' " ' section? Also, does it matter where the } (breaks) are set in the code? Since they seem to be a bit out of line.

                    The error message that it gives me now is simply:


                    2011/ invalid name
                    Unexpected error with ticker = AEGN and year = 2011
                    r(198);


                    I checked this section in the data and there is nothing wrong with it. AEGN is alphabetically the first ticker in the dataset, meaning that the regression already stops right at the beginning (The data for AEGN starts with year 2011).

                    I believe there is something wrong in the way I enter the formula? More specifically, the syntax?

                    Thank you in advance for any help/ suggestions.

                    Last edited by Konstantin Schmeisser; 17 Jun 2017, 11:35.

                    Comment


                    • #11
                      On the other hand, it would also be possible to simply remove the invalid observations from the file? The error that occured in #8 was due to some invalid observations. After using code in #9 I could identify these invalid observations. I guess it would be simplest to remove these and then use the code again from #2?

                      Comment


                      • #12
                        Somehow there is a typo in the code in #9 that is not present in the do-file I copy/pasted from. I must have hit a stray key somewhere along the way. Anyhow, the problem is that where, in #9, it says ` "f"', there should be no space between ` and ". Moving that line to the previous line, placing it after /// simply nullifies it as code: anything that appears following /// is a comment, and is not executed.

                        So here's how it should be:

                        Code:
                        gen int year = yofd(Date)
                        gen beta = .
                        
                        levelsof ticker, local(firms)
                        foreach f of local firms {
                            levelsof year if ticker == `"`f'"', local(years)
                            foreach y of local years {
                                capture noisily regress company_stock_return sp_500_return if year == `y' ///
                                    & ticker ==`"`f'"'
                                if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS
                                    replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"'
                                }
                                else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS
                                    display in red `"Unexpected error with ticker = `f' and year = `y'"'
                                    exit c(rc)
                                }
                            }
                        }
                        Important: because I used the /// to extend the code to a new line, this code cannot be run from the command window. It must be copied to a do-file and run from there.

                        The alignment of the curly braces does not matter from the perspective of Stata execution, but it is important for human readability and maintainability of the code. This sometimes seems to happen when code is copy/pasted from the Forum editor to a Stata do-file. I don't know why, and it only happens intermittently. Anyway, it is best to fix it up, though not critical to functioning of the code.

                        As you do not provide an example data set, I cannot test this code, so I cannot assure that it does not contain other typos, but I believe it is correct and should only return error messages when there is something actually wrong.

                        Added: Crossed with #11

                        Comment


                        • #13
                          It worked!!! Thank you so much! This is how the output looks like for one company

                          Code:
                           
                          date ticker ri rm edatevar year beta
                          20111031 AEGN -.050096 -.024738 31oct2011 2011 0,158551
                          20111101 AEGN -.02975 -.027942 1-nov-11 2011 0,158551
                          20111104 AEGN -.003151 -.00628 4-nov-11 2011 0,158551
                          20111109 AEGN -.065137 -.036695 9-nov-11 2011 0,158551
                          20111114 AEGN -.033715 -.00955 14-nov-11 2011 0,158551
                          20111116 AEGN -.013436 -.016616 16-nov-11 2011 0,158551
                          20111117 AEGN -.007782 -.0168 17-nov-11 2011 0,158551
                          20111118 AEGN .017647 -.000395 18-nov-11 2011 0,158551
                          20111121 AEGN -.024406 -.018648 21-nov-11 2011 0,158551
                          20111122 AEGN -.056616 -.004141 22-nov-11 2011 0,158551
                          20111123 AEGN -.045359 -.022095 23-nov-11 2011 0,158551
                          20111125 AEGN -.027047 -.002686 25-nov-11 2011 0,158551
                          20111201 AEGN -.00066 -.001909 1-dec-11 2011 0,158551
                          20111202 AEGN .007261 -.000241 2-dec-11 2011 0,158551
                          20111208 AEGN -.046846 -.021142 8-dec-11 2011 0,158551
                          20111212 AEGN -.014502 -.014914 12-dec-11 2011 0,158551
                          where ri is the company return and rm the market return. As we can see, the beta remains the same because we are looking at the same year. According to the formula in #8, this should be the beta for the year 2011, based on the daily returns in that year, correct?

                          Again, thank you very much for all the support!

                          P.S. I had to change the "date' variable into an internal Stata date format variable, which is now "edatevar"
                          Last edited by Konstantin Schmeisser; 17 Jun 2017, 13:20.

                          Comment


                          • #14
                            If what you show in #13 is the entire data for ticker AEGN in year 2011, then something has gone wrong. When I regress ri against rm in this data example, I get beta = 1.436238, not 0.158551. I note, however, that in the example, the dates span only the last two months of 2011, so I'm hoping that this is just a subset of the full AEGN 2011 data and that 0.158551 would be the correct regression coefficient were the full AEGN 2011 data shown.

                            Comment


                            • #15
                              Yes, this is just part of the AEGN 2011 data!

                              Comment

                              Working...
                              X