Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to predict values for multiple groups of observations using rolling linear regression

    I am new to stata and trying to do a (12 month) rolling regression on a dataset including daily returns for shares. Each share ticker has the same number of days of observations and I'm using a while loop to do a regression on a 12 month window for each ticker for each day. This seems to work fine. However, I then need to predict the value for each day for each ticker. The issue I have come across is that you can use the 'by' command to do the regression - for instance:

    by ticker, sort : regress daily_return factor_1 factor_2 if inrange(mydate, `i'-`win'+1,`i')

    * where `win' is the window size (in business days equivalent to 12 months).

    BUT you can not use 'by' option for 'predict' and so using 'predict' following the above command only provides values for the last ticker group.

    Any ideas would be much appreciated!

    FYI I have also tried using rolling function but this does not like groups with the same ticker name on each observation.



  • #2
    Check out rangestat or asreg from SSC.

    Comment


    • #3
      Thanks Nick Cox ! I had previously looked at asreg and have now also looked at rangestat on your advice. They both look useful for the rolling regression... probably more efficient than the one I have working today. However, I am still struggling to get predicted values and also related stats e.g. confidence values) for each predicted value. Any hints you could give on how to proceed for that?

      Comment


      • #4
        asreg and rangestat have distinct if overlapping goals.

        asreg is explicitly geared to regression but with functionality for the moving window case but naturally Attaullah Shah may say more if he wishes

        rangestat has some regression functionality but is more broadly pitched and has the goal of being programmable too.

        Here is a silly example. In general I am queasy about fittting two-predictor multiple regressions to subsamples of 12 values. rangestat will fit if a regression is feasible at all, but the philosophy is strongly that what to do with the results is the user's choice and this user (as well as being second author of the command) would usually ignore incomplete windows.

        The twist that may not be obvious from the help is that you need to calculate predicted values, or indeed residuals, or whatever follows either, by yourself.

        Code:
        webuse grunfeld, clear
        
        rangestat (reg) invest mvalue kstock, int(year -12 -1) by(company)
        
        gen double predicted = b_cons + b_mvalue * mvalue + b_kstock * kstock
        
        list year reg_nobs-b_cons predicted in 1/20
        
            +-------------------------------------------------------------------------------------------+
             | year   reg_nobs      reg_r2   reg_adj~2    b_mvalue     b_kstock       b_cons   predicted |
             |-------------------------------------------------------------------------------------------|
          1. | 1935          .           .           .           .            .            .           . |
          2. | 1936          1           .           .           .            .            .           . |
          3. | 1937          2           .           .           .            .            .           . |
          4. | 1938          3           .           .           .            .            .           . |
          5. | 1939          4   .99991081   .99973242   .05436847   -.21134289    150.31552    341.8305 |
             |-------------------------------------------------------------------------------------------|
          6. | 1940          5   .99494036   .98988073   .05391537   -.23604919    153.03071   354.49891 |
          7. | 1941          6   .70620365   .51033942   .06200931   -.06825747    113.98605   378.78351 |
          8. | 1942          7   .57589074   .36383611   .06547084    .18151132     79.66855   347.18751 |
          9. | 1943          8   .50027518   .30038525   .05170874    .36373662    116.78996   422.46451 |
         10. | 1944          9   .50721217   .34294957   .05154518    .44281908    111.48473   426.48884 |
             |-------------------------------------------------------------------------------------------|
         11. | 1945         10   .46011588   .30586328   .05693085    .46800615    96.771625   496.38978 |
         12. | 1946         11   .52961712    .4120214   .06313648    .52030432    66.655822   585.34782 |
         13. | 1947         12   .66982548   .59645337   .06892708    .67033895    19.126425   772.66088 |
         14. | 1948         12   .60600737   .51845346   .09517907    .48042609   -65.152359   687.77198 |
         15. | 1949         12   .52785382   .42293245    .0968702    .33604109   -34.434041   666.80058 |
             |-------------------------------------------------------------------------------------------|
         16. | 1950         12   .64582415   .56711841   .12352259    .25933385   -107.94189   640.96748 |
         17. | 1951         12   .49005318   .37673167   .10144627    .23025164    -.2294035   768.13533 |
         18. | 1952         12   .77851078   .72929095    .0976133    .19019482    50.465961   803.27541 |
         19. | 1953         12   .90712666   .88648814   .12034609    .20330884   -38.916908   1073.5881 |
         20. | 1954         12   .94531055   .93315734   .16515702     .2363318   -235.08964   1214.8782 |
             +-------------------------------------------------------------------------------------------+
        Last edited by Nick Cox; 03 Feb 2023, 02:42.

        Comment


        • #5
          Thanks again Nick Cox. Using rangestat seems to have done the trick!

          Is there any way to use rangestat to calculate the standard error or does that also need to be calculated manually? Currently I am using this equation to calculate:

          gen double se_predicted = sqrt(abs(se_mkt * b_mkt + se_smb * b_smb + se_hml * b_hml + se_umd * b_umd))

          Where se_xxx and b_xxx are the results of the rangestat command on the independent variables mkt smb hml and umd.

          Does this seem the correct approach to you?



          Comment


          • #6
            Thanks again Nick Cox. Using rangestat seems to have done the trick!

            Is there any way to use rangestat to calculate the standard error or does that also need to be calculated manually? Currently, I am using this equation to calculate the std error for the predicted value:

            gen double se_predicted = sqrt(abs(se_mkt * b_mkt + se_smb * b_smb + se_hml * b_hml + se_umd * b_umd))

            Where se_xxx and b_xxx are the results of the rangestat command on the independent variables mkt smb hml and umd.

            Does this seem like the correct approach to you?



            Comment


            • #7
              I don't think that is the right recipe. It is not dimensionally correct for a start, and there are other errors.

              If you mean what Stata calls the RMSE that is roughly the SD of the residuals, and more precisely there is a correction factor depending on the number of estimated parameters. rangestat doesn't provide it directly.

              Comment


              • #8
                Thank you, Nick, for tagging me in the post. With asreg, you have the ability to report standard errors, robust standard errors, and Newey-adjusted errors. You can find more information on what asreg can do by following this link: https://fintechprofessor.com/2017/12...ions-in-stata/
                You can also find t- and p-values, see https://fintechprofessor.com/asdocx/...n-stata-asdoc/. Using the grunfeld example, here is the asreg code.
                Code:
                webuse grunfeld
                bys company : asreg invest mvalue kstock, window(year 12) fit se
                list company year _Nobs _R2 _b_mvalue _b_kstock _b_cons _se_mvalue _se_kstock _se_cons _fitted _residuals
                
                     +-------------------------------------------------------------------------------------------------------------------------------------+
                     | company   year   _Nobs         _R2    _b_mvalue    _b_kstock      _b_cons   _se_mv~e   _se_ks~k   _se_cons     _fitted   _residuals |
                     |-------------------------------------------------------------------------------------------------------------------------------------|
                  1. |       1   1935       .           .            .            .            .          .          .          .           .            . |
                  2. |       1   1936       .           .            .            .            .          .          .          .           .            . |
                  3. |       1   1937       .           .            .            .            .          .          .          .           .            . |
                  4. |       1   1938       4   .99991081    .05436847   -.21134289    150.31552   .0004711   .0057604   1.939499   257.91024   -.21023193 |
                  5. |       1   1939       5   .99494036    .05391537   -.23604919    153.03071   .0024509   .0289021   9.172574    337.5661   -6.7661144 |
                     |-------------------------------------------------------------------------------------------------------------------------------------|
                  6. |       1   1940       6   .70620365    .06200931   -.06825747    113.98605   .0141898    .199579   60.06083   387.80812    73.391895 |
                  7. |       1   1941       7   .57589074    .06547084    .18151132     79.66855   .0252212   .2873074   107.6045   423.96115    88.038848 |
                  8. |       1   1942       8   .50027518    .05170874    .36373662    116.78996   .0289252   .2135686   127.4547   395.00509    52.994906 |
                  9. |       1   1943       9   .50721217    .05154518    .44281908    111.48473   .0301334   .1866926   136.0049   437.38193    62.218072 |
                 10. |       1   1944      10   .46011588    .05693085    .46800615    96.771625   .0324527   .1600647   140.5098   440.43891    107.06109 |
                     |-------------------------------------------------------------------------------------------------------------------------------------|
                 11. |       1   1945      11   .52961712    .06313648    .52030432    66.655822   .0317785   .1700296   140.5316   510.17387    51.026143 |
                 12. |       1   1946      12   .66982548    .06892708    .67033895    19.126425   .0322636    .196689   149.3191   626.54146    61.558511 |
                 13. |       1   1947      12   .60600737    .09517907    .48042609   -65.152359   .0336002   .1388177    152.509   636.34109   -67.441062 |
                 14. |       1   1948      12   .52785382     .0968702    .33604109   -34.434041   .0388995   .0957986   171.7739   590.81371   -61.613693 |
                 15. |       1   1949      12   .64582415    .12352259    .25933385   -107.94189   .0295627   .0691066   133.3756   613.66286   -58.562885 |
                     |-------------------------------------------------------------------------------------------------------------------------------------|
                 16. |       1   1950      12   .49005318    .10144627    .23025164    -.2294035   .0436315   .0711805   203.9463   633.80878    9.0912429 |
                 17. |       1   1951      12   .77851078     .0976133    .19019482    50.465961   .0207543   .0257267   76.47462   751.92934    3.9706865 |
                 18. |       1   1952      12   .90712666    .12034609    .20330884   -38.916908   .0199379   .0295009    82.7973   844.60882    46.591197 |
                 19. |       1   1953      12   .94531055    .16515702     .2363318   -235.08964   .0246159   .0352379   109.5764   1215.8035    88.596519 |
                 20. |       1   1954      12   .94617596    .18321764    .31158295    -372.6152   .0181538   .0492577   86.56063   1345.9081    140.79182 |
                     |-------------------------------------------------------------------------------------------------------------------------------------|
                Last edited by Attaullah Shah; 09 Feb 2023, 06:06.
                Regards
                --------------------------------------------------
                Attaullah Shah, PhD.
                Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                FinTechProfessor.com
                https://asdocx.com
                Check out my asdoc program, which sends outputs to MS Word.
                For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                Comment

                Working...
                X