Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • trying to bootstrap residuals

    Hi!

    i am writing my master thesis on Norwegian Mutual Funds and are trying to, like Kosowski et al (2006) and Fama and French (2010), bootstrap the residuals with resampling in order to make inference on the distribution of a and t(a) and wether mutual funds inhibit skill or just luck.

    I am not an advanced Stata user, but have written a program that I believe do what i want, but judgning by the results, something is wrong.

    So, what I want to do is run a regression model, save the residuals and keep the coefficients. Then i sample the residuals with replacement, create new return on the form y = xB + uhat, and then run new regressions on this return series with a zero alpha by construction. In earlier research, this has given results that the worst performing funds have outliers in the distribution telling us that they are underperforming not only due to bad luck, but also bad skill, and vice versa for the top performers. When I do this on my dataset, I get very dull results, with no outliers and as good as normally distributed a's and t(a)'s. Why might this be? Are my residuals normally distributed and thus the simulated alphas will be as well= Or is my code wrong?


    Hope some of you have some knowledge about this and can help me. Would be much appreciated :D

    this is the original regression model:

    r = a +bMKT + bSMB + bHML + e


    This is the program I run

    use "C:\Users\Alexander\Dropbox\Mester\Regression results\torsdag_3_july.dta", clear
    quietly regress r_mutualfund1 MKT SMB HML, r
    predict uhat, resid
    keep uhat
    save residuals, replace
    program bootresiduals
    version 13.1
    drop _all
    use residuals
    bsample
    merge using "C:\Users\Alexander\Dropbox\Mester\Regression results\torsdag_3_july.dta"
    regress r_mutualfund1 MKT SMB HML, r
    predict xb
    gen ystar = xb + uhat
    reg ystar MKT SMB HML
    end

    and then run

    simulate _b _se, reps(10000): bootresiduals



    kind regards,

    alex

  • #2
    You'll need to provide us with more information.

    Your references are not complete, and you do not tell us what t(a) is except that later in your
    post it might be obvious that a is the intercept. An example with data that others have access to
    would also be helpful.

    Here I've tried to reproduce your process, but I chose not to use bsample or merge.
    I also cut out the extra call to regress by passing in the original regression coefficients
    and using matrix score to reproduce the linear prediction used to simulate the resampled
    depvar.

    Code:
    program bs_resid
            version 13.1
            syntax, RESidual(varname numeric) MATrix(name)
    
            * get the varlist for -regress-
            local xvars : colna `matrix'
            local CONS _cons
            local xvars : list xvars - CONS
    
            * compute the linear prediction
            tempvar xb idx y
            matrix score double `xb' = `matrix'
    
            * idx randomly selects the observations with replacement
            gen long `idx' = ceil(_N*runiform())
    
            * the new dependent variable using resample residuals
            gen double `y' = `xb' + `residual'[`idx']
    
            regress `y' `xvars', vce(robust)
    end     
    
    set seed 12345
    sysuse auto
    
    regress mpg turn trunk displ, vce(robust)
    matrix b = e(b)
    predict double resid, residuals
    histogram resid
    
    simulate _b _se, reps(1000) : bs_resid, res(resid) mat(b)
    sum
    As for

    When I do this on my dataset, I get very dull results, with no outliers and as good as normally distributed a's and t(a)'s. Why might this be? Are my residuals normally distributed and thus the simulated alphas will be as well= Or is my code wrong?
    If a histogram of the residuals from the original linear regression appears reasonably
    symmetric, I would expect to see what you are observing.

    Comment


    • #3
      Thank you for your answer, and sorry for not providing all the relevant information.

      a and t(a) referred to the alpha or constant in the regression, and t(a) its t-statistic.

      the regression I run is excess return on mutual funds on the Carhart (1997) four-factor model to explain stock market returns. The alpha is a measure of excess return above the risk-adjusted return implied by the model. The bootstrap is done in order to distinguish skill from luck, that is, are alpha tot he right in the tail due to just luck, or do the managers possess skills to deliver this alpha. In earlier research there has been a few of the best and a few of the worst funds that has had skill or lacked skill, respectively. That is, a percentage of the bootstrapped alphas are above/below the actual alpha, but in my dataset, if I compute a percent of alphas above/below actual alpha, I get 50%/50% every time. Done on earlier research it shown that on the worst funds almost all bootstrapped alphas are above, so that the bad performance is due to bad skill, and vice verca for the best funds.

      I will try your program. If I can understand it. The problem is a have very little knowledge of programming in Stata.

      I do not have an example dataset, but would gladly provide mine, if it easier for you to help me then.

      Appreciate your help a lot!!!

      //alex

      Comment


      • #4
        Here is the procedure described mathematically.



        Click image for larger version

Name:	1.PNG
Views:	1
Size:	137.1 KB
ID:	62191
        Click image for larger version

Name:	2.PNG
Views:	1
Size:	108.0 KB
ID:	62192

        Comment


        • #5
          The procedure you give does not match the one you are trying to implement.

          I suspect that this description of the procedure is not correct, but I do not have access
          to your references. To whit, I wouldn't know where to look given that you have yet to
          provide complete references.

          Comment


          • #6
            Here are the references I have given. The Sørensen (2009) article is my inspiration for my paper, trying to do the same as he on a different dataset, and Fama & French (2010) describes the method, first used by Kosowski et al (2006), but with a few modifications.

            Fama, Eugene F., and Kenneth R. French. "Luck versus skill in the cross‐section of mutual fund returns." The Journal of Finance 65.5 (2010): 1915-1947.

            Kosowski, Robert, et al. "Can mutual fund “stars” really pick stocks? New evidence from a bootstrap analysis." The Journal of finance 61.6 (2006): 2551-2595.
            What else would you need in order to help me?

            Sørensen, Lars Qvigstad. "Mutual fund performance at the Oslo Stock Exchange." Available at SSRN 1488745 (2009).




            Once again, the help is much appreciated!!
            Last edited by Alexander Lauritzen; 10 Jul 2014, 11:26.

            Comment


            • #7
              "neztirual": I suspect that you are breaking some Copyright Law by uploading those 2 articles from the Journal of Finance. I certainly wouldn't recommend it. Far better is to provide the full reference, plus URL of the journal article (or preprint or working paper) and/or DOI. PS you might get more people willing to help you if you used your real name here, as strongly recommended in the Forum FAQ -- you can change "neztirual" by sending a message via the Contact Us link (bottom right hand side of screen).

              Comment


              • #8
                Kosowski et al (2006) describes it as

                "To prepare for our bootstrap procedure, we use the Carhart model to compute
                ordinary least squares (OLS)-estimated alphas, factor loadings, and residuals
                using the time series of monthly net returns (minus the T-bill rate) for fund
                i, rit
                rît = âi + ^biMKTt + ^b2iSMBt + ^b3iHMLt + ei,t (1)
                For fund i, the coefficient estimates, {â, ^b, ^b2, ^b3} ,as well as the time series
                of estimated residuals, {êi,t, t = Ti0,.....,Ti1}, and the t-statistic of alpha, t(âi) , are
                saved, where Ti0 and Ti1 are the dates of the first and last monthly returns
                available for fund i, respectively.

                Using our baseline bootstrap, for each fund i, we draw a sample with replacement
                from the fund residuals that are saved in the first step above, creating a
                pseudo–time series of resampled residuals, {ebi,t, t = SbTio,.....,SbTi1}, where b is an
                index for the bootstrap number (so b = 1 for bootstrap resample number 1),
                and where each of the time indices SbTio,.....,SbTi1 are drawn randomly from
                Ti0,.....,Ti1 in such a way that reorders the original sample of Ti1 − Ti0 + 1
                residuals for fund i. Conversely, the original chronological ordering of the factor
                returns is unaltered; we relax this restriction in a different version of our
                bootstrap below.

                Next, we construct a time series of pseudo–monthly excess returns for
                this fund, imposing the null hypothesis of zero true performance ( âi= 0, or,
                equivalently, t(âi) = 0.

                rbît = ^biMKTt + ^b2iSMBt + ^b3iHMLt + ebi,t (2)
                for t = Ti0,....,Ti1 and SbTio,.....,SbTi1. sbTi1. As equation (2) indicates, this sequence
                of artificial returns has a true alpha (and t-statistic of alpha) that is zero by
                construction. However, when we next regress the returns for a given bootstrap
                sample, b, on the Carhart factors, a positive estimated alpha (and t-statistic)
                may result, since that bootstrap may have drawn an abnormally high number
                of positive residuals, or, conversely, a negative alpha (and t-statistic) may result
                if an abnormally high number of negative residuals are drawn.
                Repeating the above steps across all funds i = 1, . . . ,N, we arrive at
                a draw from the cross section of bootstrapped alphas. Repeating this for
                all bootstrap iterations, b = 1, . . . , 1,000, we then build the distribution of
                these cross-sectional draws of alphas, { âbi , i = 1, . . . , N}, or their t-statistics,
                {^tbi(a), i = 1, . . . , N}, that result purely from sampling variation while imposing
                the null of a true alpha that is equal to zero. For example, the distribution
                of alphas (or t-statistics) for the top fund is constructed as the distribution
                of the maximum alpha (or, maximum t-statistic) generated across
                all bootstraps.12 As we note in Section I.A, this cross-sectional distribution
                can be nonnormal, even if individual fund alphas are normally distributed.
                If we find that our bootstrap iterations generate far fewer extreme
                positive values of ˆα (or ˆtˆα) compared to those observed in the actual
                data, then we conclude that sampling variation (luck) is not the sole
                source of high alphas, but rather that genuine stock-picking skills actually
                exist."
                - Kosowski, Robert, et al. "Can mutual fund “stars” really pick stocks? New evidence from a bootstrap analysis." The Journal of finance 61.6 (2006): 2551-2595.


                Fama and French do the same, only
                The difference between the two approaches
                is that Kosowski et al. (2006) bootstrap the residuals from the individual fund returns independently,
                while Fama and French (2009) sample the fund residuals and factor returns jointly.




                This is the procedure I want to implement, but as said, I have little knowledge of programming in Stata (or any other language), so it is not surprising that the program i tried to implement, does not do exactly this.

                Comment


                • #9
                  Alex, the Stata code you provide in your original post appears to implement what you quote from Kosowski.

                  The code I provided does too, albeit maybe a little more efficiently.

                  It seems to me that we are left with interpreting what they mean by "extreme" in

                  If we find that our bootstrap iterations generate far fewer extreme
                  positive values of ˆα (or ˆtˆα) compared to those observed in the actual
                  data, then we conclude that sampling variation (luck) is not the sole
                  source of high alphas, but rather that genuine stock-picking skills actually
                  exist
                  My naive impression is that we could use a standard 5% critical value from Student's t distribution as
                  a cuttoff for determining "extreme" values of ˆtˆα.

                  However, I have my doubts about this procedure. It seems to me that they are trying to establish that the
                  fitted models are not correctly specified, that there is an unobserved component that yields better results
                  for some funds and not others.

                  Comment


                  • #10
                    Thanks alot Jeff!

                    I highly appreciate the help.

                    The code you provided seems to work much better, and provides results much closer to the ones found by Kosowski and others.

                    They say the difference is that Kosowski individually bootstraps the results, while Fama & French jointly samples the residuals and factor returns. What does the code you provided do? Jointly or individually?

                    I thank you alot for your help. We've been stuck for a while on the thesis on this problem.



                    /alex

                    Comment


                    • #11
                      Our code is bootstrapping the residuals, while leaving the other variables as untouched.

                      After rereading though the thread I realized that our code is not zeroing out the intercept
                      before generating the Y variable from the bootstapped residuals. Here is a modified version
                      of my code that does this

                      Code:
                      program bs_resid
                              version 13.1
                              syntax, RESidual(varname numeric) MATrix(name)
                      
                              * get the varlist for -regress-
                              local xvars : colna `matrix'
                              local CONS _cons
                              local xvars : list xvars - CONS
                      
                              * compute the linear prediction
                              tempvar xb idx y
                              matrix score double `xb' = `matrix'
                      
                              * idx randomly selects the observations with replacement
                              gen long `idx' = ceil(_N*runiform())
                      
                              * the new dependent variable using resample residuals
                              gen double `y' = `xb' + `residual'[`idx']
                      
                              regress `y' `xvars', vce(robust)
                      end     
                      
                      set seed 12345
                      sysuse auto
                      
                      regress mpg turn trunk displ, vce(robust)
                      matrix b = e(b)
                      
                      * zero intercept
                      local icons = colnumb(b, "_cons")
                      matrix b[1,`icons'] = 0
                      
                      predict double resid, residuals
                      histogram resid
                      
                      simulate _b _se, reps(1000) : bs_resid, res(resid) mat(b)
                      sum

                      Comment


                      • #12
                        Hi Jeff!

                        I have a few questions to the code you provided. We had a chat with our thesis supervisor, and he pointed out a few things about the code.


                        First, where in the program bs_resid is the program using the variables from the regression? Or is it just using the residuals and coefficients? I see the xb, but where is the program getting this from? When running the program we only give it the res(residual) and mat(b), which is the residuals and matrix of coefficients?


                        Second, the pogram uses coefficients from regress, but what coefficients is used the second time the program runs, it is a regress command in the program as well, will this replace the coefficients in memory, or will it use the coefficients from the original regression in every simulation run?



                        Thanks,


                        Alex

                        Comment


                        • #13
                          The b matrix contains the regression coefficients from the original
                          call to regress. This matrix is used to get the list of regressors,
                          which is stored in the xvars macro, and produce/simulate a new
                          dependent variable with the resampled residuals.

                          Once created, the b matrix is not modified in bs_resid
                          or by simulate. The same regression coefficients are used to
                          simulate the new dependent variable, only the residuals are bootstrapped
                          to simulate the new dependent variables.

                          simulate collects the regression coefficients and their estimated
                          standard errors from regress called within bs_resid.

                          Comment


                          • #14
                            Is there a possibility to get a code for the Version 13.0 . I would highly appreciate any endeavours.

                            Comment


                            • #15
                              Is there a possibility to get a code for the Version 13.0??? I would highly appreciate any endeavours.

                              Comment

                              Working...
                              X