trying to bootstrap residuals

Rich Goldstein

Join Date: Mar 2014

Posts: 4426
#16

06 Aug 2014, 18:49

since the upgrade to 13.1 is free, why do you need something for 13.0? see - h update-
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#17

07 Aug 2014, 10:07

Thanks again!

If i add [`idx'] after `xb' when generating a new dependent variable, will it then bootstrap the linear prediction as well, as Fama & French?

Code:

* the new dependent variable using resample residuals gen double `y' = `xb'[`idx'] + `residual'[`idx']

I get some funds with 100% of the simulated alphas/t-stats above the actual. Isn't this extreme? Or just evidence of very bad skill?

and finally, is there a way to loop this, as I do it individually for every fund (about 70 funds), for 2 time periods, and needless to say, this is a very tedious task :P

thanks alot Jeff! Really helpful!

//alex
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 667
#18

07 Aug 2014, 12:55

Changing the line

Code:

gen double `y' = `xb' + `residual'[`idx']

to

Code:

gen double `y' = `xb'[`idx'] + `residual'[`idx']

does in fact resample the linear predictions in the same way as
the residuals.

It seems odd to me that the resulting intercept estimates are 100% more
extreme than the observed intercept, they should be estimating zero by
construction.

The short answer to the looping question is: yes. It is just a matter of how
your data is structured. I would compose a program that performed the task
for a single fund and time period, then just call this program in a loop that changes
the fund and time period. simulate can save its results to a dataset, so
I would recommend composing the new data file names from the fund names
and time period.
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#19

11 Aug 2014, 06:33

Here are the results from the bootstrap. It reports in the leftmost colum the actual and average simulated alphas, as well as the percentage of simulated alhas above actual. The rightmost column reports the same for the t-statistics of alpha.

Any ideas on the interpretation? It seems odd to me that the worst funds ranked on their actual alpha has the highest simulated alpha values?!

1 Photo
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#20

15 Aug 2014, 04:54

Hi!

I have a question about the code. I added [`idx'] to the `xb* as well, but what I really want is to add [`idx'] to resample only the factor returns and residuals, while keeping the coefficients constant, as Fama and French does.. Now it seems to me that it resamples residuals and the coefficients*factor returns), Is this possible?

further, any ideas as tho why the simulated alphas/t-stats are so high? They are higher for the worst funds, than the best.

Thanks a lot!
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 667
#21

15 Aug 2014, 09:58

Excerpted from my previous post, the following code computes the linear
prediction from the originally fitted model, generates a simple random sample
with replacement of the current set of observations, then generates a new
outcome from the resampled linear prediction and residuals.

Code:

tempvar xb idx y matrix score double `xb' = `matrix' gen long `idx' = ceil(_N*runiform()) gen double `y' = `xb'[`idx'] + `residual'[`idx']

This is the same as sampling from the x variables and residuals, then
using the original model coefficients to generate a new outcome. The
coefficients are not changing in the code I provided.

any ideas as tho why the simulated alphas/t-stats are so high?

I have no ideas. It seems odd to me that the resulting intercept estimates
are 100% more extreme than the observed intercept, they should be estimating
zero by construction.
Comment
Steven Archambault

Join Date: Jun 2014

Posts: 62
#22

02 Sep 2014, 12:37

Hello. I have been reading through this thread the last couple of days. I think the program you wrote is doing what I am trying to do. I will try to explain it here.

I analyze y=cons+bX+e

Then, I predict y using the results, to get yhat.

yhat=cons+bX

I randomly want to add the residuals (e*) back to my yhat terms, to get yhat*

yhat*=yhat+e*

I then regress the bootstrapped Y*s using the original set of regressors.

I believe this is the goal of your program above, but not 100% sure.

Any assistance here would be great.

Thank you.

-Steve

PS I am sorry my user name is not my full name. I am in the process of changing that.
Comment
Catherine Bloom

Join Date: Sep 2015

Posts: 6
#23

08 Oct 2015, 06:24

Dear all,

I hope you are well. I am studying your Stata program for the Fama and French (2010) bootstrapping. Many thanks for your kind contribution on the programming, which helped me a lot.

Dear Jeff,

I am writing to ask can you kindly please help me to add a loop function to your original Stata program.

According to the literature, we need to obtain OLS-estimated alphas, factor loadings and residuals for each fund. Then, construct a sample of pseudo excess returns by randomly resampling independent variables and residuals with replacement over the full cross section of fund returns simultaneously, and impose the null of zero intercept, thereby producing a common time ordering across all funds in each bootstrap. More detailed explanation was shown in page 1 of this thread.

For your information, the methodology of Fama and French (2010) is also available from pages 6-8 in the following working paper http://www.pensions-institute.org/workingpapers/wp1404.pdf.

It's shame to say that I have very little knowledge about Stata programming.

Many thanks. All your help is highly appreciated.

Catherine
Comment
fan wang

Join Date: Apr 2017

Posts: 27
#24

19 Apr 2017, 12:04

Hi Jeff,

Is there a possibility to get a code for the Version 11.2? I would highly appreciate any endeavors.
Comment

fan wang

Join Date: Apr 2017
Posts: 27

#25

19 Apr 2017, 14:09

Originally posted by Jeff Pitblado (StataCorp) View Post

Our code is bootstrapping the residuals, while leaving the other variables as untouched.

After rereading though the thread I realized that our code is not zeroing out the intercept
before generating the Y variable from the bootstapped residuals. Here is a modified version
of my code that does this

Code:

program bs_resid
version 13.1
syntax, RESidual(varname numeric) MATrix(name)

* get the varlist for -regress-
local xvars : colna `matrix'
local CONS _cons
local xvars : list xvars - CONS

* compute the linear prediction
tempvar xb idx y
matrix score double `xb' = `matrix'

* idx randomly selects the observations with replacement
gen long `idx' = ceil(_N*runiform())

* the new dependent variable using resample residuals
gen double `y' = `xb' + `residual'[`idx']

regress `y' `xvars', vce(robust)
end

set seed 12345
sysuse auto

regress mpg turn trunk displ, vce(robust)
matrix b = e(b)

* zero intercept
local icons = colnumb(b, "_cons")
matrix b[1,`icons'] = 0

predict double resid, residuals
histogram resid

simulate _b _se, reps(1000) : bs_resid, res(resid) mat(b)
sum

Hi Jeff.

tried your example code with my STATA 11 and having the following error message

Code:

 program mysim_r
  1. version 11
  2. syntax name(name=bvector), res(varname)
  3. tempvar y rid
  4. local xvars : colnames 'bvector'
  5. local cons _cons
  6. local xvars: list xvars - cons
  7. matrix score double 'y' = 'bvector'
  8. gen long 'rid' = int(_N*runiform())+1
  9. replace 'y' = 'y'+'res'['rid']
 10. regress 'y' 'xvars'
 11. end

. set seed 54321

. mysim_r b, res(res)
varlist not allowed
r(101);

I mam not sure how to solve the error. Could you please kindly help me to solve it?

Best Regards
Fan

Comment

Kerstin Frederike Hansen

Join Date: Jul 2017

Posts: 4
#26

21 Dec 2018, 06:06

Dear Jeff,

thank you for posting the above command. I do have a similar problem for which I wanted to use your code:

Based on observable characteristics, I am trying to simulate hypothetical program starts for non-participants in a program evaluation (i.e. I have a data set with approximately 20.000 treated and 20.000 non-treated observations. Those treated receive treatment in 6 different time periods. In order to evaluate ATE, I want to simulate hypothetical program starts for the non-treated before I do a propensity score matching.)
I follow a strategy proposed in a paper by Lechner et al. (2011, EER): "We regress the log time to participation within the unemployment spell of participants on a set of personal and regional characteristics that seem important for the timing of the program; then we use the estimated coefficients together with a draw from the residual distribution to predict a corresponding value for nonparticipants."

I ran your code on my data:

program bs_resid
version 13.1
syntax, RESidual(varname numeric) MATrix(name)

* get the varlist for -regress-
local xvars : colna `matrix'
local CONS: _cons
local xvars : list xvars - CONS

* compute the linear prediction
tempvar xb idx y
matrix score double `xb' = `matrix'

* idx randomly selects the observations with replacement
gen long `idx' = ceil(_N*runiform())

* the new dependent variable using resample residuals
gen double `y' = `xb' + `residual'[`idx']

regress `y' `xvars', vce(robust)
end

set seed 12345

reg treat $bs_cntr_ind $empl_hist , vce(robust) , if treat>0
matrix b = e(b)

* zero intercept
local icons = colnumb(b, "_cons")
matrix b[1,`icons'] = 0

predict double resid, residuals
histogram resid

simulate _b _se , reps(1000) : bs_resid, res(resid) mat(b)
sum

however get the following error:
. simulate _b _se , reps(1000) : bs_resid, res(resid) mat(b)
_cons not allowed
an error occurred when simulate executed bs_resid

Any chance you could tell me what the mistake is here?

Thank you in advance and happy holidays to everyone.

Kerstin
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#27

23 Dec 2018, 13:20

Kerstin, as this thread was initiated it was referring to the asset pricing literature, where the asset pricing model says that your constant has to be 0.

In your application, there is no such requirement, and in fact to me seems wrong to impose the zero constant constraint.

Look up the first program that Jeff provided, in this first program, he was not imposing the constraint that the constant is 0. As your error message is related somehow to the constant, reverting to his initial program, which is actually the correct one for your case, might resolve the problem.
Comment
Kerstin Frederike Hansen

Join Date: Jul 2017

Posts: 4
#28

31 Dec 2018, 06:47

Dear Joro
thanks for your reply. I was aware of that and used the first code without the constrained constant, however it still does not work. The error I get again and again is "an error occurred when simulate executed bs_resid"
Would it make sense to make a new post with the problem I have?
Thank you very much for your help

Kerstin
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#29

31 Dec 2018, 07:23

Kerstin, I do not know what will maximise your chances of a useful response. On one hand it is a different issue, so another thread might be a good idea, but then you are working on a program that was firstly posted here on this thread, so it makes perfect sense to me to post your question on this thread here.

Speaking to the issue, the error messages that -simulate- return are not very useful because "an error occurred when simulate executed PROGRAM_NAME" , simply tells you that something went wrong. What went wrong, on the basis of this message only god knows.

The way how I trouble shoot simulations is that

1. I firstly run the program itself, that is, if Program_Name is run multiple times by -simulate-, after I have written the program I would firstly run it, and see if it goes through. When you do that you will see where exactly in the program the error occurs.

2. Another very useful tool for trouble shooting programs and simulations is

set trace on

this will report step by step everything that Stata does and interprets, and then it will be easier to see where the problem has occurred.

In short, somewhere in your do file after you have defined bs_resid, but before you have executed it multiple times through -simulate-, add the following lines of code:

set trace on

bs_resid

And see then what happens and whether it does not clarify what goes wrong.
Comment
Kerstin Frederike Hansen

Join Date: Jul 2017

Posts: 4
#30

05 Jan 2019, 09:10

Dear Joro

thank you very much. This helped me a lot.

Best

Kerstin
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment