Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi
    I was not able to adjust the loop and use rangestat.
    What I want to do is:
    1- Using a 5 year rolling regression:
    So I should be running a regression using data for the first 5 years in the sample (for example 1990 to 1994) and calculate the fitted and residual values, keep the fitted value and residual for each observation for the regression in which it serves as the last observation, then move one year forward and drop one year (i.e. from 1991 to 1995), and then calculate the fitted and residual values , and so on.
    How can I use rangestat here ?

    2- In another code, I want to do the same but using a recursive regression (increasing windows, where I add one year each time without dropping a year)
    Can I also use rangestat here?

    I will be so thankful if someone can help with these issues...and produce efficient and time-saving codes! dependent variable is ret , independent variables are x y z . (firm id is "permno", and time id is "yr" )

    Many many thanks

    Comment


    • #17
      Any advice with regard to my #16 post ?

      Comment


      • #18
        Here's an example of calculating fitted and residuals based on regressions over a 5 year window using rangestat.

        Code:
        * define a linear regression in Mata using quadcross() - help mata cross(), example 2
        mata:
        mata clear
        mata set matastrict on
        real rowvector myreg(real matrix Xall)
        {
            real colvector y, b, Xy
            real matrix X, XX
        
            y    = Xall[.,1]
            X     = Xall[.,2::cols(Xall)]
            
            XX = quadcross(X, X)
            Xy = quadcross(X, y)
            b  = invsym(XX) * Xy
        
            return(rows(X), b')
        }
        end
        
        * regressions over a rolling window of 5 periods
        webuse grunfeld, clear
        gen double constant = 1
        rangestat (myreg) mvalue invest kstock constant, by(company) interval(time -4 0) casewise
        rename (myreg1 myreg2 myreg3 myreg4) (nobs b_invest b_kstock b_constant)
        
        * calculate fitted value and residual for each observation
        gen double fitted = b_constant + b_invest*invest + b_kstock*kstock
        gen double resid = mvalue - fitted
        You can do random checks to verify that the code above does what you want. For example:
        Code:
        . local obs = 175
        
        . regress mvalue invest kstock if inrange(time, time[`obs']-4, time[`obs']) & company == company[`obs']
        
              Source |       SS           df       MS      Number of obs   =         5
        -------------+----------------------------------   F(2, 2)         =      3.05
               Model |  6349.91831         2  3174.95915   Prob > F        =    0.2471
            Residual |  2083.67131         2  1041.83565   R-squared       =    0.7529
        -------------+----------------------------------   Adj R-squared   =    0.5059
               Total |  8433.58962         4   2108.3974   Root MSE        =    32.277
        
        ------------------------------------------------------------------------------
              mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              invest |  -4.234408    4.38622    -0.97   0.436    -23.10679    14.63797
              kstock |   -2.72894   1.597838    -1.71   0.230    -9.603882    4.146002
               _cons |   1437.467   734.8455     1.96   0.190    -1724.318    4599.252
        ------------------------------------------------------------------------------
        
        . predict resid_check, resid
        
        . list company year resid* in `obs'
        
             +--------------------------------------+
             | company   year      resid   resid_~k |
             |--------------------------------------|
        175. |       9   1949   -13.0714   -13.0714 |
             +--------------------------------------+
        Since this is a small dataset, you can actually calculate the residuals for every observations using a loop:
        Code:
        local nobs = _N
        gen double resid_loop = .
        qui forvalues i = 1/`nobs' {
            cap regress mvalue invest kstock if inrange(time, time[`i']-4, time[`i']) & company == company[`i']
            if _rc == 0 {
                predict double resid_temp, resid
                replace resid_loop = resid_temp if _n == `i'
                drop resid_temp
            }
        }
        
        * 
        gen diff  = abs(resid - resid_loop)
        sum diff

        Comment


        • #19
          If I understand your description of a recursive regression, you want to include all previous observations in each regression. That's a simple tweak from the fixed 5 year window solution using rangestat. All you need is to change the desired interval so that it starts from the beginning of time:

          Code:
          * define a linear regression in Mata using quadcross() - help mata cross(), example 2
          mata:
          mata clear
          mata set matastrict on
          real rowvector myreg(real matrix Xall)
          {
              real colvector y, b, Xy
              real matrix X, XX
          
              y    = Xall[.,1]
              X     = Xall[.,2::cols(Xall)]
              
              XX = quadcross(X, X)
              Xy = quadcross(X, y)
              b  = invsym(XX) * Xy
          
              return(rows(X), b')
          }
          end
          
          * regressions over all periods up to the current observation
          webuse grunfeld, clear
          gen double constant = 1
          gen wayback = c(minfloat)
          rangestat (myreg) mvalue invest kstock constant, by(company) interval(time wayback 0) casewise
          rename (myreg1 myreg2 myreg3 myreg4) (nobs b_invest b_kstock b_constant)
          
          * calculate fitted value and residual for each observation
          gen double fitted = b_constant + b_invest*invest + b_kstock*kstock
          gen double resid = mvalue - fitted
          And you can check the results for this small dataset using

          Code:
          * redo the complete regressions using a loop
          local nobs = _N
          gen double resid_loop = .
          qui forvalues i = 1/`nobs' {
              cap regress mvalue invest kstock if inrange(time, c(minfloat), time[`i']) & company == company[`i']
              if _rc == 0 {
                  predict double resid_temp, resid
                  replace resid_loop = resid_temp if _n == `i'
                  drop resid_temp
              }
          }
          
          *
          gen diff  = abs(resid - resid_loop)
          sum diff

          Comment


          • #20
            Many Thanks Robert for your efforts.
            I have not worked with mata before. I looked at the mata manual, and my understanding is that I have to create an ado file and save the mata function.
            But do I need to call the progam by a name that stata will recognize it later? Can anyone clarify what shall I include in the ado file? i.e. how to get Robert's program to run?

            Comment


            • #21
              Copy and paste Robert's code into a do-file editor and run it. There are no extra tricks to learn here.

              Comment


              • #22
                I tried this first using Robert's code and the same data set Robert uses.
                I got the following error message when the code reached this point:

                . rename (myreg1 myreg2 myreg3 myreg4) (nobs b_invest b_kstock b_constant)
                ( invalid name
                r(198);

                end of do-file

                r(198);


                Comment


                • #23
                  What version of Stata are you using?

                  You can substitute the following if your version does not support rename group:

                  Code:
                  rename myreg1 nobs
                  rename myreg2 b_invest
                  rename myreg3 b_kstock
                  rename myreg4 b_constant

                  Comment


                  • #24
                    Dear Robert/statalist team
                    The code worked very fast (in seconds) and produced fitted and residual values after renaming using your post #23 (I am using Stata 11).
                    However, by looking at the code, I do not find any indication that it runs a rolling regression for a window of "5" years? In my original post, I asked for fited values and residuals as follows (look at post #16, copied and pasted her again for your convenience):

                    1- Using a 5 year rolling regression:
                    So I should be running a regression using data for the first 5 years in the sample (for example 1990 to 1994) and calculate the fitted and residual values, keep the fitted value and residual for each observation for the regression in which it serves as the last observation, then move one year forward and drop one year (i.e. from 1991 to 1995), and then calculate the fitted and residual values , and so on.
                    How can I use rangestat here ?

                    2- In another code, I want to do the same but using a recursive regression (increasing windows, where I add one year each time without dropping a year)
                    Can I also use rangestat here?

                    Does your code correspond to requirement 1 (the rolling regression) or requirement 2 (the recursive regression) ?
                    and if so, how can I amend it to get the other one?

                    I am almost there now ........................... would appreciate your help so much!

                    Comment


                    • #25
                      You should spend some time reading the help file for rangestat, in particular the description of the interval()option. In #18 and #19, the results for each observations are based on separate regressions, each using a 5 year window of observations from the same company. Since the current observation is included, that means setting the lower bounds to 4 years prior to the year of the current observation.

                      In the grunfeld dataset, this is observation 175:
                      Code:
                      . list in 175
                      
                           +--------------------------------------------------+
                           | company   year   invest   mvalue   kstock   time |
                           |--------------------------------------------------|
                      175. |       9   1949    32.54    276.9      370     15 |
                           +--------------------------------------------------+
                      Go back and look at #18 and the example for observation 175. Here's another way to show that the results for observation 175 are calculated using a 5 year window:

                      Code:
                      webuse grunfeld, clear
                      list in 175
                      keep if inrange(time, 11, 15) & company == 9
                      regress mvalue invest kstock
                      predict resid_check, resid
                      list in l
                      and the results:
                      Code:
                      . list in l
                      
                           +-------------------------------------------------------------+
                           | company   year   invest   mvalue   kstock   time   resid_~k |
                           |-------------------------------------------------------------|
                        5. |       9   1949    32.54    276.9      370     15   -13.0714 |
                           +-------------------------------------------------------------+
                      ​

                      Comment


                      • #26
                        Hi Robert/all participants
                        I have run the recursive regression code in #19. It worked and produced fitted and residuals in seconds which is amazing.
                        However when I check the fitted values and residuals, I found fitted values and residuals starting from year 1993 (my sample starts from year 1991). I would expect that fitted values and residuals to have been produced starting from year 1995. This is because we need a five year window first to estimate the fitted and residuals for the observations in the last year of the 5 year regression, then when we add a year we repeat the regression for 6 years and then have the fitted and residuals for the observations in the added year, and so on (without changing the previous ones of course). If the code does that , one would expect not to have any fitted values and residuals for year 1991,1992,1993,1994 ! But it appears that I get them starting from year 1993 which is puzzling ?
                        The same problem arises when I run the code in #18!!

                        I will be so thankful if this is can be fixed/clarified as the code appears to be so efficient than any rolling command !! Indeed, if it does its job, this will be very helpful to many finance researchers due to its time-saving!

                        Can this be fixed? Thanks

                        Comment


                        • #27
                          Any ideas on how to get this done?

                          Comment


                          • #28
                            Glad to hear that rangestat solved both your problems in seconds (instead of days if rolling is used)!

                            The results include the nobs variable. That tells you how many observations were used to calculate the regression results. If you are only interested in observations that are based on a sample of 5, then just replace the results with missing values when nobs < 5.

                            Comment


                            • #29
                              When you say observations that are based on a sample of 5, I guess you mean 5 years, right? but if nobs tells us how many"observations" are used not "years" , then would simply replacing fitted values and residuals by missing values would solve the issue?
                              I have done
                              replace fitted=. if nobs<5
                              replace residual=. if nobs<5

                              I have also tried:
                              replace fitted=. if yr==1991|yr==1992|yr==1993|yr==1994
                              replace residual=. if yr==1991|yr==1992|yr==1993|yr==1994

                              I noted that the first coding using nobs<5 resulted in fitted and residuals being missing for years 1991-1996 inclusive !

                              I am not sure which one is correct and feel a bit confused? Can you clarify what is happening here? Thanks

                              Comment


                              • #30
                                One more:
                                my understanding is that your code in #19 run first one cross section regression for one year, estimate the residuals and fitted values,
                                then run a cross section time series regression for 2 years (year 1 and 2) and estimate residuals and fitted values for each observation in year 2,
                                then run a cross section time series regression for 3 years (year 1, 2 and 3) and estimate the residuals and fitted values for each observation in year 3
                                and so on

                                Am I correct in my understanding here?

                                Comment

                                Working...
                                X