Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling beta calculation

    Hi all,
    I am trying to code a rolling beta. The situation is as follows: The time period is 2014-2019 and I am trying to find the betas of each stock at the beginning of the month based don't the previous year of daily observations, resulting in 60 betas per company in the dataset. Do you have any idea how I do this regression? I have tried to code it in a variety of ways but am finding it difficult to run the regression only for the beginning of the month, and as my dataset is rather large (10 million observations) any tips to run the regression faster would also be appreciated. I have done all of the prep work: I have the year_month categorized, and have a variable for start of the month. I just do not know how to incorporate it as the if function is not compatible. I currently have used this but it doesn't seem to solve the issue. * Adjust the rolling command to run only on the first of each month

    rolling beta = _b[market_excess_return], window(252) saving(rolling_beta, replace) if first_of_month == 1: ///
    regress excess_return market_excess_return if date >= date[_n-251]

    Thanks in advance
    Lucas

  • #2
    -rolling- does not accept the kind of -if- restriction you are looking to impose. Ordinarily, you just do these things without the restriction and then discard the unwanted extra results. But with a data set this large, doing 30 x as many regressions as needed is clearly something one should avoid.

    Your problem can be solved by Robert Picard's -rangerun- command, available from SSC. To use it, you must also install -rangestat- by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Using -rangerun- can be tricky. Fortunately, the help file is very clearly written and includes many worked examples, both of rolling regressions, and of restricting application to a subset of the observations. So give it a try.

    I can't help you with the coding because you did not provide example data. If you need help getting -rangerun- to work for you, you will need to do that. An example containing a couple of years of one firm's data would do the trick. If your data is confidential, you can change the firm identifier to something uninformative, and for that matter, you can replace the values of excess return and market excess return with random numbers--the real data values are not important, but the data organization and metadata are critical to getting the code right.

    And, of course, if you do post back with a data example, be sure to use -dataex- to do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Added: On further thought, you don't need -rangerun- for this. It can be done within -rangestat- alone, and that is simpler. Again, the help file is good and can probably guide you through this. If further help is needed, example data is still necessary.
    Last edited by Clyde Schechter; 22 Apr 2024, 10:35.

    Comment


    • #3
      Thanks Clyde I appreciate the detailed response! I will give this approach a try

      Comment


      • #4
        I have attempted to use it but am still struggling as it is such a complicated variable. This is my data sample. It uses a 6 year period with all of the stocks listed on the main US stock exchanges. I understand the rangestat variable on a basic level. However, I just don't understand how to incorporate the start_month variable. The time variable only moves up a value if there were returns on that day, thus the value I need to use for the 251 returns on the yearly level. What would I need to add to this to make it so it is just for start_month, while not deleting the other returns as they are also integral to the regression?

        rangestat (reg) Excess_RET Excess_mkt_RET, interval(time-251 0)

        input double(PERMNO date) float(Excess_mkt_RET Excess_RET time start_month)
        10001 19725 -.008748586 .001243553 6074 1
        10001 19726 .0007099546 .017411172 6075 0
        10001 19729 -.003135541 .019558564 6076 0
        10001 19730 .006332759 .05155761 6077 0
        10001 19731 .0003070143 .02166347 6078 0
        10001 19732 .0005293581 .016739916 6079 0
        10001 19733 .004006827 .012073739 6080 0
        10001 19736 -.01135164 -.0027124 6081 0
        10001 19737 .010583707 .01420676 6082 0
        10001 19738 .005163093 .00646459 6083 0
        10001 19739 -.00002269996 .00321106 6084 0
        10001 19740 -.0032564006 .003200776 6085 0
        10001 19744 .0036301105 .012765075 6086 0
        10001 19745 .0016686004 .02205803 6087 0
        10001 19746 -.008295123 -.013361626 6088 0
        10001 19747 -.02105768 -.003126391 6089 0
        10001 19750 -.006536097 -.017765243 6090 0
        10001 19751 .007705532 -.013831018 6091 0
        10001 19752 -.009815348 .02481013 6092 0
        10001 19753 .011700274 .0031573805 6093 0
        10001 19754 -.005639303 -.03777583 6094 0
        10001 19757 -.0229975 -6.944654e-07 6095 1
        10001 19758 .0080246115 .03380472 6096 0
        10001 19759 -.0017893172 -.01898859 6097 0
        10001 19760 .012455258 -.006453603 6098 0
        10001 19761 .012859626 -.0010844984 6099 0
        10001 19764 .0016319375 .02491727 6100 0
        10001 19765 .01085134 .02114031 6101 0
        10001 19766 .0012664236 .008797987 6102 0
        10001 19767 .007178565 .010308623 6103 0
        10001 19768 .0043539973 .0081627015 6104 0
        10001 19772 .003527657 -.018219208 6105 0
        10001 19773 -.006559116 .010308554 6106 0
        10001 19774 .006685635 -.00918487 6107 0
        10001 19775 -.0005293791 .006178129 6108 0
        10001 19778 .006201074 .018422706 6109 0
        10001 19779 -.0011082612 -.023116505 6110 0
        10001 19780 .00115337 -.0041163336 6111 0
        10001 19781 .005351238 .0041311164 6112 0
        10001 19782 .0017319535 -.03703849 6113 0
        10001 19785 -.00678113 .0042721806 6114 1
        10001 19786 .015687747 .0404243 6115 0
        10001 19787 .0005677071 .007156142 6116 0
        10001 19788 .002128364 -.015229804 6117 0
        10001 19789 -.0009599713 -.01031047 6118 0
        10001 19792 -.0013616973 .006248833 6119 0
        10001 19793 -.005459748 .0041398103 6120 0
        10001 19794 .001403298 -.00051631674 6121 0
        10001 19795 -.011199038 .0031074905 6122 0
        10001 19796 -.001235382 .0020648143 6123 0
        10001 19799 .008347486 .007215034 6124 0
        10001 19800 .007883556 -.0030720835 6125 0
        10001 19801 -.006669279 .0010354706 6126 0
        10001 19802 .00472713 .0020394402 6127 0
        10001 19803 -.0025536306 .018422289 6128 0
        10001 19806 -.005747594 -.010051514 6129 0
        10001 19807 .003733858 .011166297 6130 0
        10001 19808 -.007768589 .0040149493 6131 0
        10001 19809 -.0005683248 -.0010009952 6132 0
        10001 19810 .004702392 -1.1111666e-06 6133 0
        10001 19813 .009392286 .007006239 6134 0
        10001 19814 .008489793 -.005964812 6135 1
        10001 19815 .003030629 .010999688 6136 0
        10001 19816 -.00309231 -.0009894647 6137 0
        10001 19817 -.012248764 -.004951208 6138 0
        10001 19820 -.011281856 .00497431 6139 0
        10001 19821 .005672422 .0019794544 6140 0
        10001 19822 .011604296 .003951732 6141 0
        10001 19823 -.020438123 .0019677158 6142 0
        10001 19824 -.009513473 -.00638617 6143 0
        10001 19827 .006588235 .02036653 6144 0
        10001 19828 .00492007 -.002443608 6145 0
        10001 19829 .010820447 .009755163 6146 0
        10001 19830 .002680279 .016424363 6147 0
        10001 19834 .003640633 .04182429 6148 0
        10001 19835 .005749373 .003648937 6149 0
        10001 19836 -.002686111 .008181554 6150 0
        10001 19837 .000990329 -.0036071273 6151 0
        10001 19838 -.008985367 .02081402 6152 0
        10001 19841 .0012300673 -.015957529 6153 0
        10001 19842 .006003872 -.014414903 6154 0
        10001 19843 .003525668 -.003656651 6155 0
        10001 19844 .001149166 -.0009179898 6156 1
        10001 19845 .00005094535 -.0202026 6157 0
        10001 19848 .0013986813 .015932042 6158 0
        10001 19849 -.008296954 .014759647 6159 0
        10001 19850 .00465417 -.00909143 6160 0
        10001 19851 -.0024605016 -.005505372 6161 0
        10001 19852 .0016246212 .00553445 6162 0
        10001 19855 .011467802 -.0009179898 6163 0
        10001 19856 -.0005994352 -.012397278 6164 0
        10001 19857 -.0044547394 -.0018679425 6165 0
        10001 19858 -.008190239 -.02712871 6166 0
        10001 19859 .0035543854 -.012500476 6167 0
        10001 19862 .004358004 .002920548 6168 0
        10001 19863 -.006927523 .02427115 6169 0
        10001 19864 .00762782 -.003792021 6170 0
        10001 19865 .0035561996 -.004758364 6171 0
        10001 19866 .00477721 .009559432 6172 0
        10001 19870 .005805387 -.005682691 6173 0
        end
        format %td date

        Comment


        • #5
          Code:
          rangestat (reg) Excess_RET Excess_mkt_RET, interval(time -251 0) by(PERMNO)
          is perhaps closer to what you want.

          An extra condition if start_month == 1 wouldn't obviously help here as it would exclude most of the data you want to use.

          Comment


          • #6
            Yes indeed it is difficult as i do not have the computing power to regress the 10 million observations, and it would be difficult to only regress the first of the month as the if function cannot be used.

            Comment


            • #7
              It's not that the if qualifier (not a function) can't be used. It's that using it will give you the wrong answers if I understand what you want.

              You could always try to write your own loop but you'd need some experience to do that well.

              Otherwise you evidently need to use a computer fit for your problem.

              Comment


              • #8
                Code:
                gen low = cond(start_month, date-251, 1)
                gen high = cond(start_month, date, 0)
                
                rangestat (reg) Excess_RET Excess_mkt_RET, by(PERMNO) interval(date low high)
                does what you ask. And it will be very much faster than doing it for every observation.

                Note: In the earliest months of the data, there will not be a full year's worth of lagged observations for the regression. So you may want to -drop- the results where reg_nobs (created by -rangestat-) is unacceptably small for your purposes. Yes, you will have to do that post hoc. I cannot think of any way to build that restriction into -rangestat-.

                Sorry, I just noticed that you want all of this driven by the variable time, not date. In principle it's the same, just replacing date by time throughout.
                Code:
                gen low = cond(start_month, time-251, 1)
                gen high = cond(start_month, time, 0)
                
                rangestat (reg) Excess_RET Excess_mkt_RET, by(PERMNO) interval(time low high)
                Last edited by Clyde Schechter; 23 Apr 2024, 09:31.

                Comment


                • #9
                  Thanks I really appreciate it Clyde! This is exactly what I was looking for.

                  Comment

                  Working...
                  X