rolling and recursive regressions while storing fitted and residual values

Mike Kraft

Join Date: Dec 2014

Posts: 328
#16

15 Jun 2016, 05:02

Hi
I was not able to adjust the loop and use rangestat.
What I want to do is:
1- Using a 5 year rolling regression:
So I should be running a regression using data for the first 5 years in the sample (for example 1990 to 1994) and calculate the fitted and residual values, keep the fitted value and residual for each observation for the regression in which it serves as the last observation, then move one year forward and drop one year (i.e. from 1991 to 1995), and then calculate the fitted and residual values , and so on.
How can I use rangestat here ?

2- In another code, I want to do the same but using a recursive regression (increasing windows, where I add one year each time without dropping a year)
Can I also use rangestat here?

I will be so thankful if someone can help with these issues...and produce efficient and time-saving codes! dependent variable is ret , independent variables are x y z . (firm id is "permno", and time id is "yr" )

Many many thanks
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#17

15 Jun 2016, 06:15

Any advice with regard to my #16 post ?
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#18

15 Jun 2016, 08:58

Here's an example of calculating fitted and residuals based on regressions over a 5 year window using rangestat.

Code:

* define a linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
    real colvector y, b, Xy
    real matrix X, XX

    y    = Xall[.,1]
    X     = Xall[.,2::cols(Xall)]
    
    XX = quadcross(X, X)
    Xy = quadcross(X, y)
    b  = invsym(XX) * Xy

    return(rows(X), b')
}
end

* regressions over a rolling window of 5 periods
webuse grunfeld, clear
gen double constant = 1
rangestat (myreg) mvalue invest kstock constant, by(company) interval(time -4 0) casewise
rename (myreg1 myreg2 myreg3 myreg4) (nobs b_invest b_kstock b_constant)

* calculate fitted value and residual for each observation
gen double fitted = b_constant + b_invest*invest + b_kstock*kstock
gen double resid = mvalue - fitted

You can do random checks to verify that the code above does what you want. For example:

Code:

. local obs = 175

. regress mvalue invest kstock if inrange(time, time[`obs']-4, time[`obs']) & company == company[`obs']

      Source |       SS           df       MS      Number of obs   =         5
-------------+----------------------------------   F(2, 2)         =      3.05
       Model |  6349.91831         2  3174.95915   Prob > F        =    0.2471
    Residual |  2083.67131         2  1041.83565   R-squared       =    0.7529
-------------+----------------------------------   Adj R-squared   =    0.5059
       Total |  8433.58962         4   2108.3974   Root MSE        =    32.277

------------------------------------------------------------------------------
      mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      invest |  -4.234408    4.38622    -0.97   0.436    -23.10679    14.63797
      kstock |   -2.72894   1.597838    -1.71   0.230    -9.603882    4.146002
       _cons |   1437.467   734.8455     1.96   0.190    -1724.318    4599.252
------------------------------------------------------------------------------

. predict resid_check, resid

. list company year resid* in `obs'

     +--------------------------------------+
     | company   year      resid   resid_~k |
     |--------------------------------------|
175. |       9   1949   -13.0714   -13.0714 |
     +--------------------------------------+

Since this is a small dataset, you can actually calculate the residuals for every observations using a loop:

Code:

local nobs = _N
gen double resid_loop = .
qui forvalues i = 1/`nobs' {
    cap regress mvalue invest kstock if inrange(time, time[`i']-4, time[`i']) & company == company[`i']
    if _rc == 0 {
        predict double resid_temp, resid
        replace resid_loop = resid_temp if _n == `i'
        drop resid_temp
    }
}

* 
gen diff  = abs(resid - resid_loop)
sum diff

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#19

15 Jun 2016, 09:10

If I understand your description of a recursive regression, you want to include all previous observations in each regression. That's a simple tweak from the fixed 5 year window solution using rangestat. All you need is to change the desired interval so that it starts from the beginning of time:

Code:

* define a linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
    real colvector y, b, Xy
    real matrix X, XX

    y    = Xall[.,1]
    X     = Xall[.,2::cols(Xall)]
    
    XX = quadcross(X, X)
    Xy = quadcross(X, y)
    b  = invsym(XX) * Xy

    return(rows(X), b')
}
end

* regressions over all periods up to the current observation
webuse grunfeld, clear
gen double constant = 1
gen wayback = c(minfloat)
rangestat (myreg) mvalue invest kstock constant, by(company) interval(time wayback 0) casewise
rename (myreg1 myreg2 myreg3 myreg4) (nobs b_invest b_kstock b_constant)

* calculate fitted value and residual for each observation
gen double fitted = b_constant + b_invest*invest + b_kstock*kstock
gen double resid = mvalue - fitted

And you can check the results for this small dataset using

Code:

* redo the complete regressions using a loop
local nobs = _N
gen double resid_loop = .
qui forvalues i = 1/`nobs' {
    cap regress mvalue invest kstock if inrange(time, c(minfloat), time[`i']) & company == company[`i']
    if _rc == 0 {
        predict double resid_temp, resid
        replace resid_loop = resid_temp if _n == `i'
        drop resid_temp
    }
}

*
gen diff  = abs(resid - resid_loop)
sum diff

Comment

Mike Kraft

Join Date: Dec 2014

Posts: 328
#20

15 Jun 2016, 11:04

Many Thanks Robert for your efforts.
I have not worked with mata before. I looked at the mata manual, and my understanding is that I have to create an ado file and save the mata function.
But do I need to call the progam by a name that stata will recognize it later? Can anyone clarify what shall I include in the ado file? i.e. how to get Robert's program to run?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#21

15 Jun 2016, 11:42

Copy and paste Robert's code into a do-file editor and run it. There are no extra tricks to learn here.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#22

15 Jun 2016, 11:47

I tried this first using Robert's code and the same data set Robert uses.
I got the following error message when the code reached this point:

. rename (myreg1 myreg2 myreg3 myreg4) (nobs b_invest b_kstock b_constant)
( invalid name
r(198);

end of do-file

r(198);
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#23

15 Jun 2016, 12:13

What version of Stata are you using?

You can substitute the following if your version does not support rename group:

Code:

rename myreg1 nobs rename myreg2 b_invest rename myreg3 b_kstock rename myreg4 b_constant
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#24

15 Jun 2016, 12:40

Dear Robert/statalist team
The code worked very fast (in seconds) and produced fitted and residual values after renaming using your post #23 (I am using Stata 11).
However, by looking at the code, I do not find any indication that it runs a rolling regression for a window of "5" years? In my original post, I asked for fited values and residuals as follows (look at post #16, copied and pasted her again for your convenience):

1- Using a 5 year rolling regression:
So I should be running a regression using data for the first 5 years in the sample (for example 1990 to 1994) and calculate the fitted and residual values, keep the fitted value and residual for each observation for the regression in which it serves as the last observation, then move one year forward and drop one year (i.e. from 1991 to 1995), and then calculate the fitted and residual values , and so on.
How can I use rangestat here ?

2- In another code, I want to do the same but using a recursive regression (increasing windows, where I add one year each time without dropping a year)
Can I also use rangestat here?

Does your code correspond to requirement 1 (the rolling regression) or requirement 2 (the recursive regression) ?
and if so, how can I amend it to get the other one?

I am almost there now ........................... would appreciate your help so much!
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#25

15 Jun 2016, 13:13

You should spend some time reading the help file for rangestat, in particular the description of the interval()option. In #18 and #19, the results for each observations are based on separate regressions, each using a 5 year window of observations from the same company. Since the current observation is included, that means setting the lower bounds to 4 years prior to the year of the current observation.

In the grunfeld dataset, this is observation 175:

Code:

. list in 175

     +--------------------------------------------------+
     | company   year   invest   mvalue   kstock   time |
     |--------------------------------------------------|
175. |       9   1949    32.54    276.9      370     15 |
     +--------------------------------------------------+

Go back and look at #18 and the example for observation 175. Here's another way to show that the results for observation 175 are calculated using a 5 year window:

Code:

webuse grunfeld, clear
list in 175
keep if inrange(time, 11, 15) & company == 9
regress mvalue invest kstock
predict resid_check, resid
list in l

and the results:

Code:

. list in l

     +-------------------------------------------------------------+
     | company   year   invest   mvalue   kstock   time   resid_~k |
     |-------------------------------------------------------------|
  5. |       9   1949    32.54    276.9      370     15   -13.0714 |
     +-------------------------------------------------------------+

Comment

Mike Kraft

Join Date: Dec 2014

Posts: 328
#26

16 Jun 2016, 09:59

Hi Robert/all participants
I have run the recursive regression code in #19. It worked and produced fitted and residuals in seconds which is amazing.
However when I check the fitted values and residuals, I found fitted values and residuals starting from year 1993 (my sample starts from year 1991). I would expect that fitted values and residuals to have been produced starting from year 1995. This is because we need a five year window first to estimate the fitted and residuals for the observations in the last year of the 5 year regression, then when we add a year we repeat the regression for 6 years and then have the fitted and residuals for the observations in the added year, and so on (without changing the previous ones of course). If the code does that , one would expect not to have any fitted values and residuals for year 1991,1992,1993,1994 ! But it appears that I get them starting from year 1993 which is puzzling ?
The same problem arises when I run the code in #18!!

I will be so thankful if this is can be fixed/clarified as the code appears to be so efficient than any rolling command !! Indeed, if it does its job, this will be very helpful to many finance researchers due to its time-saving!

Can this be fixed? Thanks
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#27

16 Jun 2016, 11:52

Any ideas on how to get this done?
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#28

16 Jun 2016, 12:26

Glad to hear that rangestat solved both your problems in seconds (instead of days if rolling is used)!

The results include the nobs variable. That tells you how many observations were used to calculate the regression results. If you are only interested in observations that are based on a sample of 5, then just replace the results with missing values when nobs < 5.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#29

16 Jun 2016, 13:07

When you say observations that are based on a sample of 5, I guess you mean 5 years, right? but if nobs tells us how many"observations" are used not "years" , then would simply replacing fitted values and residuals by missing values would solve the issue?
I have done
replace fitted=. if nobs<5
replace residual=. if nobs<5

I have also tried:
replace fitted=. if yr==1991|yr==1992|yr==1993|yr==1994
replace residual=. if yr==1991|yr==1992|yr==1993|yr==1994

I noted that the first coding using nobs<5 resulted in fitted and residuals being missing for years 1991-1996 inclusive !

I am not sure which one is correct and feel a bit confused? Can you clarify what is happening here? Thanks
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#30

16 Jun 2016, 13:30

One more:
my understanding is that your code in #19 run first one cross section regression for one year, estimate the residuals and fitted values,
then run a cross section time series regression for 2 years (year 1 and 2) and estimate residuals and fitted values for each observation in year 2,
then run a cross section time series regression for 3 years (year 1, 2 and 3) and estimate the residuals and fitted values for each observation in year 3
and so on

Am I correct in my understanding here?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment