trying to bootstrap residuals

Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#1

trying to bootstrap residuals

09 Jul 2014, 06:55

Hi!

i am writing my master thesis on Norwegian Mutual Funds and are trying to, like Kosowski et al (2006) and Fama and French (2010), bootstrap the residuals with resampling in order to make inference on the distribution of a and t(a) and wether mutual funds inhibit skill or just luck.

I am not an advanced Stata user, but have written a program that I believe do what i want, but judgning by the results, something is wrong.

So, what I want to do is run a regression model, save the residuals and keep the coefficients. Then i sample the residuals with replacement, create new return on the form y = xB + uhat, and then run new regressions on this return series with a zero alpha by construction. In earlier research, this has given results that the worst performing funds have outliers in the distribution telling us that they are underperforming not only due to bad luck, but also bad skill, and vice versa for the top performers. When I do this on my dataset, I get very dull results, with no outliers and as good as normally distributed a's and t(a)'s. Why might this be? Are my residuals normally distributed and thus the simulated alphas will be as well= Or is my code wrong?

Hope some of you have some knowledge about this and can help me. Would be much appreciated :D

this is the original regression model:

r = a +bMKT + bSMB + bHML + e

This is the program I run

use "C:\Users\Alexander\Dropbox\Mester\Regression results\torsdag_3_july.dta", clear
quietly regress r_mutualfund1 MKT SMB HML, r
predict uhat, resid
keep uhat
save residuals, replace
program bootresiduals
version 13.1
drop _all
use residuals
bsample
merge using "C:\Users\Alexander\Dropbox\Mester\Regression results\torsdag_3_july.dta"
regress r_mutualfund1 MKT SMB HML, r
predict xb
gen ystar = xb + uhat
reg ystar MKT SMB HML
end

and then run

simulate _b _se, reps(10000): bootresiduals

kind regards,

alex
Tags: None
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 699
#2

09 Jul 2014, 15:09

You'll need to provide us with more information.

Your references are not complete, and you do not tell us what t(a) is except that later in your
post it might be obvious that a is the intercept. An example with data that others have access to
would also be helpful.

Here I've tried to reproduce your process, but I chose not to use bsample or merge.
I also cut out the extra call to regress by passing in the original regression coefficients
and using matrix score to reproduce the linear prediction used to simulate the resampled
depvar.

Code:

program bs_resid version 13.1 syntax, RESidual(varname numeric) MATrix(name) * get the varlist for -regress- local xvars : colna `matrix' local CONS _cons local xvars : list xvars - CONS * compute the linear prediction tempvar xb idx y matrix score double `xb' = `matrix' * idx randomly selects the observations with replacement gen long `idx' = ceil(_N*runiform()) * the new dependent variable using resample residuals gen double `y' = `xb' + `residual'[`idx'] regress `y' `xvars', vce(robust) end set seed 12345 sysuse auto regress mpg turn trunk displ, vce(robust) matrix b = e(b) predict double resid, residuals histogram resid simulate _b _se, reps(1000) : bs_resid, res(resid) mat(b) sum

As for

When I do this on my dataset, I get very dull results, with no outliers and as good as normally distributed a's and t(a)'s. Why might this be? Are my residuals normally distributed and thus the simulated alphas will be as well= Or is my code wrong?

If a histogram of the residuals from the original linear regression appears reasonably
symmetric, I would expect to see what you are observing.
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#3

10 Jul 2014, 04:33

Thank you for your answer, and sorry for not providing all the relevant information.

a and t(a) referred to the alpha or constant in the regression, and t(a) its t-statistic.

the regression I run is excess return on mutual funds on the Carhart (1997) four-factor model to explain stock market returns. The alpha is a measure of excess return above the risk-adjusted return implied by the model. The bootstrap is done in order to distinguish skill from luck, that is, are alpha tot he right in the tail due to just luck, or do the managers possess skills to deliver this alpha. In earlier research there has been a few of the best and a few of the worst funds that has had skill or lacked skill, respectively. That is, a percentage of the bootstrapped alphas are above/below the actual alpha, but in my dataset, if I compute a percent of alphas above/below actual alpha, I get 50%/50% every time. Done on earlier research it shown that on the worst funds almost all bootstrapped alphas are above, so that the bad performance is due to bad skill, and vice verca for the best funds.

I will try your program. If I can understand it. The problem is a have very little knowledge of programming in Stata.

I do not have an example dataset, but would gladly provide mine, if it easier for you to help me then.

Appreciate your help a lot!!!

//alex
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#4

10 Jul 2014, 04:39

Here is the procedure described mathematically.
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 699
#5

10 Jul 2014, 10:56

The procedure you give does not match the one you are trying to implement.

I suspect that this description of the procedure is not correct, but I do not have access
to your references. To whit, I wouldn't know where to look given that you have yet to
provide complete references.
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#6

10 Jul 2014, 11:07

Here are the references I have given. The Sørensen (2009) article is my inspiration for my paper, trying to do the same as he on a different dataset, and Fama & French (2010) describes the method, first used by Kosowski et al (2006), but with a few modifications.

Fama, Eugene F., and Kenneth R. French. "Luck versus skill in the cross‐section of mutual fund returns." The Journal of Finance 65.5 (2010): 1915-1947.

Kosowski, Robert, et al. "Can mutual fund “stars” really pick stocks? New evidence from a bootstrap analysis." The Journal of finance 61.6 (2006): 2551-2595.
What else would you need in order to help me?

Sørensen, Lars Qvigstad. "Mutual fund performance at the Oslo Stock Exchange." Available at SSRN 1488745 (2009).

Once again, the help is much appreciated!!

Last edited by Alexander Lauritzen; 10 Jul 2014, 11:26.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#7

10 Jul 2014, 11:16

"neztirual": I suspect that you are breaking some Copyright Law by uploading those 2 articles from the Journal of Finance. I certainly wouldn't recommend it. Far better is to provide the full reference, plus URL of the journal article (or preprint or working paper) and/or DOI. PS you might get more people willing to help you if you used your real name here, as strongly recommended in the Forum FAQ -- you can change "neztirual" by sending a message via the Contact Us link (bottom right hand side of screen).
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#8

10 Jul 2014, 15:58

Kosowski et al (2006) describes it as

"To prepare for our bootstrap procedure, we use the Carhart model to compute
ordinary least squares (OLS)-estimated alphas, factor loadings, and residuals
using the time series of monthly net returns (minus the T-bill rate) for fund
i, r_it
r_ît= â_i + ^b_iMKT_t + ^b2_iSMB_t + ^b3_iHML_t + e_i,t (1)
For fund i, the coefficient estimates, {â, ^b, ^b2, ^b3} ,as well as the time series
of estimated residuals, {ê_i,t, t = T_i0,.....,T_i1}, and the t-statistic of alpha, t(â_i) , are
saved, where T_i0 and T_i1are the dates of the first and last monthly returns
available for fund i, respectively.

Using our baseline bootstrap, for each fund i, we draw a sample with replacement
from the fund residuals that are saved in the first step above, creating a
pseudo–time series of resampled residuals, {e^b_i,t, t = S^b_Tio,.....,S^b_Ti1}, where b is an
index for the bootstrap number (so b = 1 for bootstrap resample number 1),
and where each of the time indices S^b_Tio,.....,S^b_Ti1are drawn randomly from
T_i0,.....,T_i1 in such a way that reorders the original sample of Ti1 − Ti0 + 1
residuals for fund i. Conversely, the original chronological ordering of the factor
returns is unaltered; we relax this restriction in a different version of our
bootstrap below.

Next, we construct a time series of pseudo–monthly excess returns for
this fund, imposing the null hypothesis of zero true performance ( â_i= 0, or,
equivalently, t(â_i) = 0.

r^b_ît= ^b_iMKT_t + ^b2_iSMB_t + ^b3_iHML_t + e^b_i,t (2)
for t = T_i0,....,T_i1and S^b_Tio,.....,S^b_Ti1.sbTi1. As equation (2) indicates, this sequence
of artificial returns has a true alpha (and t-statistic of alpha) that is zero by
construction. However, when we next regress the returns for a given bootstrap
sample, b, on the Carhart factors, a positive estimated alpha (and t-statistic)
may result, since that bootstrap may have drawn an abnormally high number
of positive residuals, or, conversely, a negative alpha (and t-statistic) may result
if an abnormally high number of negative residuals are drawn.
Repeating the above steps across all funds i = 1, . . . ,N, we arrive at
a draw from the cross section of bootstrapped alphas. Repeating this for
all bootstrap iterations, b = 1, . . . , 1,000, we then build the distribution of
these cross-sectional draws of alphas, { â^b_i , i = 1, . . . , N}, or their t-statistics,
{^t^b_i(a), i = 1, . . . , N}, that result purely from sampling variation while imposing
the null of a true alpha that is equal to zero. For example, the distribution
of alphas (or t-statistics) for the top fund is constructed as the distribution
of the maximum alpha (or, maximum t-statistic) generated across
all bootstraps.12 As we note in Section I.A, this cross-sectional distribution
can be nonnormal, even if individual fund alphas are normally distributed.
If we find that our bootstrap iterations generate far fewer extreme
positive values of ˆα (or ˆtˆα) compared to those observed in the actual
data, then we conclude that sampling variation (luck) is not the sole
source of high alphas, but rather that genuine stock-picking skills actually
exist."
_-Kosowski, Robert, et al. "Can mutual fund “stars” really pick stocks? New evidence from a bootstrap analysis." The Journal of finance 61.6 (2006): 2551-2595.

Fama and French do the same, only The difference between the two approaches
is that Kosowski et al. (2006) bootstrap the residuals from the individual fund returns independently,
while Fama and French (2009) sample the fund residuals and factor returns jointly.

This is the procedure I want to implement, but as said, I have little knowledge of programming in Stata (or any other language), so it is not surprising that the program i tried to implement, does not do exactly this.
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 699
#9

11 Jul 2014, 09:25

Alex, the Stata code you provide in your original post appears to implement what you quote from Kosowski.

The code I provided does too, albeit maybe a little more efficiently.

It seems to me that we are left with interpreting what they mean by "extreme" in

If we find that our bootstrap iterations generate far fewer extreme
positive values of ˆα (or ˆtˆα) compared to those observed in the actual
data, then we conclude that sampling variation (luck) is not the sole
source of high alphas, but rather that genuine stock-picking skills actually
exist

My naive impression is that we could use a standard 5% critical value from Student's t distribution as
a cuttoff for determining "extreme" values of ˆtˆα.

However, I have my doubts about this procedure. It seems to me that they are trying to establish that the
fitted models are not correctly specified, that there is an unobserved component that yields better results
for some funds and not others.
Comment
Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#10

17 Jul 2014, 04:49

Thanks alot Jeff!

I highly appreciate the help.

The code you provided seems to work much better, and provides results much closer to the ones found by Kosowski and others.

They say the difference is that Kosowski individually bootstraps the results, while Fama & French jointly samples the residuals and factor returns. What does the code you provided do? Jointly or individually?

I thank you alot for your help. We've been stuck for a while on the thesis on this problem.

/alex
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 699

#11

18 Jul 2014, 10:15

Our code is bootstrapping the residuals, while leaving the other variables as untouched.

After rereading though the thread I realized that our code is not zeroing out the intercept
before generating the Y variable from the bootstapped residuals. Here is a modified version
of my code that does this

Code:

program bs_resid
        version 13.1
        syntax, RESidual(varname numeric) MATrix(name)

        * get the varlist for -regress-
        local xvars : colna `matrix'
        local CONS _cons
        local xvars : list xvars - CONS

        * compute the linear prediction
        tempvar xb idx y
        matrix score double `xb' = `matrix'

        * idx randomly selects the observations with replacement
        gen long `idx' = ceil(_N*runiform())

        * the new dependent variable using resample residuals
        gen double `y' = `xb' + `residual'[`idx']

        regress `y' `xvars', vce(robust)
end     

set seed 12345
sysuse auto

regress mpg turn trunk displ, vce(robust)
matrix b = e(b)

* zero intercept
local icons = colnumb(b, "_cons")
matrix b[1,`icons'] = 0

predict double resid, residuals
histogram resid

simulate _b _se, reps(1000) : bs_resid, res(resid) mat(b)
sum

Comment

Alexander Lauritzen

Join Date: Jul 2014

Posts: 10
#12

05 Aug 2014, 06:27

Hi Jeff!

I have a few questions to the code you provided. We had a chat with our thesis supervisor, and he pointed out a few things about the code.

First, where in the program bs_resid is the program using the variables from the regression? Or is it just using the residuals and coefficients? I see the xb, but where is the program getting this from? When running the program we only give it the res(residual) and mat(b), which is the residuals and matrix of coefficients?

Second, the pogram uses coefficients from regress, but what coefficients is used the second time the program runs, it is a regress command in the program as well, will this replace the coefficients in memory, or will it use the coefficients from the original regression in every simulation run?

Thanks,

Alex
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 699
#13

06 Aug 2014, 10:34

The b matrix contains the regression coefficients from the original
call to regress. This matrix is used to get the list of regressors,
which is stored in the xvars macro, and produce/simulate a new
dependent variable with the resampled residuals.

Once created, the b matrix is not modified in bs_resid
or by simulate. The same regression coefficients are used to
simulate the new dependent variable, only the residuals are bootstrapped
to simulate the new dependent variables.

simulate collects the regression coefficients and their estimated
standard errors from regress called within bs_resid.
Comment
georg.luft

Join Date: Aug 2014

Posts: 2
#14

06 Aug 2014, 18:25

Is there a possibility to get a code for the Version 13.0 . I would highly appreciate any endeavours.
Comment
georg.luft

Join Date: Aug 2014

Posts: 2
#15

06 Aug 2014, 18:26

Is there a possibility to get a code for the Version 13.0??? I would highly appreciate any endeavours.
Comment

Announcement

trying to bootstrap residuals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment