Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Necessary or not to adjust 'ereturn' into 'eexcess_return'?

    Code:
    gen alpha = .
    forvalues j = 1/`=_N' {        
         capture ereturn clear      
         capture regress return risk_premium if UniqueID == UniqueID[`j'] ///              
         & inrange(monthyear, monthyear[`j']-23, monthyear[`j'])        
         capture replace alpha = _b[_cons] in `j'
    }
    Is it necessary or not to adjust 'ereturn' into 'eexcess_return' if I want to do more regressions with the above code, when I'm going to use the dependent variable 'excess_return' instead of 'return'?

    Code:
      
    gen alpha = .
    forvalues j = 1/`=_N' {        
       capture ereturn clear    
       capture regress return risk_premium if UniqueID == UniqueID[`j']              
       capture replace alpha = _b[_cons] in `j'
    }
    Is it still necessary to use -capture ereturn clear- if I don't use a rolling window (like -23 in the first code)

    Code:
    gen alpha = .
    
    levelsof UniqueID1, local(fundnum)
    foreach id of local fundnum {
      capture quietly regress return risk_premium if UniqueID1 == `id'
      capture replace alpha = _b[_cons] if e(sample)
    }
    Does this third code generate the same results as the second code? Does it require -ereturn clear-?



    I'm trying to save the intercept/constant of a few hundred regressions

    Last edited by Victoria Rogers; 01 Nov 2014, 12:42.

  • #2
    Code:
    gen alpha = .
    forvalues j = 1/`=_N' {
       capture ereturn clear
       capture regress return risk_premium if UniqueID1 == UniqueID1[`j']
       capture replace alpha = _b[_cons] in `j'
    }
    Nevermind about my first post. I solved it myself. However, 1 question of my first post remains....which is unsolvable on my own

    What's the difference between these 2 codes based on the results they create. The first code takes a lot more time than the second, but their results should be the same if I'm correct. Code:

    Code:
    gen alpha = .
    levelsof UniqueID1, local(fundnum)
    foreach id of local fundnum {
        capture ereturn clear
        capture quietly regress return risk_premium if UniqueID1 == `id'
        capture replace alpha = _b[_cons] if e(sample)
    }
    Last edited by Victoria Rogers; 01 Nov 2014, 14:36.

    Comment


    • #3
      If you take out the -ereturn clear- part, the first code block looks like an earlier version of code I suggested. The -ereturn clear- part was added because there were some regressions in the loop that could not be carried out due to insufficient observations. When that happened, the ereturn results from the last successful regression were still active, with the result that the variable alpha would be inappropriately set to the constant term from that last regression.

      By clearing ereturn, the code assures that if the regression in the next line of code fails, there will not be any _b[_cons] lying around from an earlier regression to "contaminate" the alpha variable with.

      So, if you are quite confident that none of the regressions in your newer loop will fail (an audacious assumption, IMO), then by all means feel free to take out the -ereturn clear- command. But if you are wrong in that assumption, you will get spurious alphas following unsuccessful regressions, just as you did before. And there will be no warning that this is happening: you may or may not spot it when you review your results. Moreover, I don't know what you think you stand to gain by removing that statement: it cannot cause you any errors. It takes almost no time to execute: even in a loop that iterates millions of time, I am confident that the time it adds to your overall computations is measured in, at most, seconds.

      As for your third block of code, no, it definitely does not produce the same results as the second block. In the second block of code, each run through the loop indexes a particular observation in the data, carries out a regression on a 2 year block of observations that chronologically precedes and includes that one, and stores the constant term of that regression in that particular indexed observation. In the third block of code, the constant term will be stored in every observation that was included in the regression. In particular, the third block of code will cause all of the earlier alpha values to be overwritten any time an observation appears in a new regression, so after the loop finally exits each observation's value of alpha will correspond to the constant term of the final regression in which it participated, not the one that was indexed by it.

      Finally, I don't know what you mean by "Is it necessary or not to adjust 'ereturn' into 'eexcess_return' " Perhaps you do not understand what ereturn is. The Stata reserved word -ereturn- refers to the results stored by any of Stata's estimation commands. There are numerous commands that can be applied to ereturn. I suggest you check the manual [P] ereturn section, or at least -help ereturn-. The term has nothing to do with the fact that your model is about returns or excess returns in the financial sense.
      Last edited by Clyde Schechter; 01 Nov 2014, 14:02.

      Comment


      • #4
        #2 arrived while I was still writing #3.

        Looking more carefully at your post than I did before, I now notice that your loop no longer involves doing a different regression for each observation in the data set. Rather you are just trying to carry out a single regression for each fund. For that purpose, the loop over local fundnum is, indeed, far more efficient, and will produce the same results as the loop over observations.

        That said, you might still want to preserve the -capture ereturn clear- statement in the loop if it is possible for the regression to fail for some value of UniqueID. It will prevent a hard-to-spot error and it will not noticeably add to your computing time.

        My apologies for having misunderstood your post #1.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          If you take out the -ereturn clear- part, the first code block looks like an earlier version of code I suggested. The -ereturn clear- part was added because there were some regressions in the loop that could not be carried out due to insufficient observations. When that happened, the ereturn results from the last successful regression were still active, with the result that the variable alpha would be inappropriately set to the constant term from that last regression.

          By clearing ereturn, the code assures that if the regression in the next line of code fails, there will not be any _b[_cons] lying around from an earlier regression to "contaminate" the alpha variable with.

          So, if you are quite confident that none of the regressions in your newer loop will fail (an audacious assumption, IMO), then by all means feel free to take out the -ereturn clear- command. But if you are wrong in that assumption, you will get spurious alphas following unsuccessful regressions, just as you did before. And there will be no warning that this is happening: you may or may not spot it when you review your results. Moreover, I don't know what you think you stand to gain by removing that statement: it cannot cause you any errors. It takes almost no time to execute: even in a loop that iterates millions of time, I am confident that the time it adds to your overall computations is measured in, at most, seconds.
          I noticed that I ran a similar code in the beginning of my .do-file without the -ereturn clear- and I was wondering if it could have generated biased results. The similar code is the third code of my first post. Your very useful code takes about 1,5 hour for 1000 funds and I ran your code and the similar code several times.

          My apologies for being unclear. I did read the manual about -ereturn- but I couldn't delete/edit my first post. I'm very confused that I keep getting a R-squared of about 0.0005 so therefore I'm checking everything and hoping to find a simple easy thing to adjust.

          If I understood you correctly, I should use your code in case I need to run regressions based on a rolling time window (e.g. the one with monthyear -23) and I should use the other code when I run regressions which aren't based on a rolling time window.
          Last edited by Victoria Rogers; 01 Nov 2014, 15:00.

          Comment


          • #6

            Hi Victoria,

            The following code in Mata might help you run much faster the rolling regression within a window of 23 months. Just rename the variable return in your data as ret and run the code. You will see that the variable alpha is added to the data.

            Code:
            egen g = group(UniqueID )
            gen alpha = .
            mata
            mata clear
            st_view(UniqueID =.,.,"UniqueID ")
            st_view(monthyear=.,.,"monthyear")
            st_view(ret=.,.,"ret")
            st_view(risk_premium=.,.,"risk_premium")
            st_view(g=.,.,"g")
            st_view(alpha=.,.,"alpha")
            p = panelsetup(UniqueID ,1)
            for ( i=1; i<=rows(p); i++ ) {
                for ( o=p[i,2]; o>=p[i,1]; o-- ) {
                    y = J(1,1,.)
                    X = J(1,2,.)
                    b = .
                    for    (  t=o; t>=p[i,1]; t-- ) {
                        if ( g[o,1] == g[t,1] & monthyear[o,1] - monthyear[t,1] <= 23 )  {    
                        y = y \ ret[t,1]
                        X = X \ (risk_premium[t,1],1)
                        }            
                    }
                    y=y[(2..rows(y)),.]
                    X=X[(2..rows(X)),.]        
                    if (rows(y)>=2)  {
                        b = invsym(X'X)*X'y    
                        alpha[o,1] = b[2,1]
                    }
                }
            }
            end
            Abraham

            Comment


            • #7
              Thank you! I don't understand Mata but it sounds like a grreat code. I'll try it.

              Comment


              • #8
                If I would use the dependent variable ret (return) + 4 independent variables (var1--var4) instead of only risk premium, would the code be the following:

                Code:
                egen g = group(UniqueID )
                gen alpha = .
                mata
                mata clear
                st_view(UniqueID =.,.,"UniqueID ")
                st_view(monthyear=.,.,"monthyear")
                st_view(ret=.,.,"ret")
                st_view(var1=.,.,"var1")
                st_view(var2=.,.,"var2")
                st_view(var3=.,.,"var3")
                st_view(var4=.,.,"var4")
                st_view(g=.,.,"g")
                st_view(alpha=.,.,"alpha")
                p = panelsetup(UniqueID ,1)
                for ( i=1; i<=rows(p); i++ ) {
                for ( o=p[i,5]; o>=p[i,1]; o-- ) {
                y = J(1,1,.)
                X = J(1,2,3,4,5.)
                b = .
                for ( t=o; t>=p[i,1]; t-- ) {
                if ( g[o,1] == g[t,1] & monthyear[o,1] - monthyear[t,1] <= 23 ) {
                y = y \ ret[t,1]
                X = X \ (var1[t,1],1)
                X = X \ (var2[t,1],2)
                X = X \ (var3[t,1],3)
                X = X \ (var4[t,1],4)
                }
                }
                y=y[(5..rows(y)),.]
                X=X[(5..rows(X)),.]
                if (rows(y)>=5) {
                b = invsym(X'X)*X'y
                alpha[o,1] = b[5,1]
                }
                }
                }
                end

                Comment


                • #9
                  My apologies for being unclear. I did read the manual about -ereturn- but I couldn't delete/edit my first post. I'm very confused that I keep getting a R-squared of about 0.0005 so therefore I'm checking everything and hoping to find a simple easy thing to adjust.
                  Since posts occur frequently on this forum about modeling abnormal returns (whatever that is--I'm an epidemiologist) and rolling regressions, I feel confident that you are far from the first person to do this. What kind of R-squared do others typically get in these things? If your results are way out of range, then you have reason to suspect that you are doing something wrong or your data is funky. I am confident that the code that I wrote for you to do the rolling regressions implements the process that you described. Whether that process is the appropriate one for modeling abnormal returns, I have no idea. Perhaps other Forum participants who work in this field can comment on that. And as for your data, only you can assess its quality (from its original source through any data management steps that you have applied to get your analytic data set.)



                  If I understood you correctly, I should use your code in case I need to run regressions based on a rolling time window (e.g. the one with monthyear -23) and I should use the other code when I run regressions which aren't based on a rolling time window.
                  Your code clearly performs a single regression for each fund number, using all of the complete-data observations for that fund number, and stores the same alpha in each of those. My code performs a single regression for each observation in the data set, including only those observation for that observation's fund that fall within 2 years before that observation, and the alpha is stored only in that observation. So, it seems you have understood me correctly, though I cannot be entirely sure what "regressions which aren't based on a rolling time window" you might have in mind beyond those shown in your earlier posts today, and for which the response might be different.

                  With regard to Abraham's suggested code, he has provided you with Mata code that calculates regression analyses. His code is based on the textbook matrix algebra version of linear regression. In theory, it is correct. In practice, this approach is not typically used for actual computations because it can decompensate numerically if the data are poorly conditioned (in a technical sense that I won't elaborate here). Stata's -regress- routine does not rely on this formula and is robust to "pathological" data. To my knowledge, no commercial statistical package implements its linear regression routine relying on that formula. I would be hesitant to use Abraham's code: it may run faster than -regress-, but I am rarely in a hurry to get answers that I can't trust. Admittedly, most commonly encountered data will work just fine with Abraham's code--but not all. Perhaps if I could independently verify that the various matrices that would arise in my data under Abraham's code were "well behaved," I would feel differently--but that verification is probably a bigger undertaking than the entire loop of regressions run under the slowest code you have experimented with.

                  Comment


                  • #10

                    Victoria,

                    Regarding Clyde's comment, I compared the results of Stata's regress and Mata's matrix calculation using more than 100000 randomly drawn obs, and the results were identical. Clyde's comment may be beyond this test. However, you may compare results using small subset of your data and decide for yourself which one to use.

                    Here is the rolling regression code modified to use 4 independent variables:

                    Code:
                    egen g = group(UniqueID )
                    gen alpha = .
                    mata
                    mata clear
                    st_view(UniqueID =.,.,"UniqueID ")
                    st_view(monthyear=.,.,"monthyear")
                    st_view(ret=.,.,"ret")
                    st_view(var1=.,.,"var1")
                    st_view(var2=.,.,"var2")
                    st_view(var3=.,.,"var3")
                    st_view(var4=.,.,"var4")
                    st_view(g=.,.,"g")
                    st_view(alpha=.,.,"alpha")
                    p = panelsetup(UniqueID ,1)
                    for ( i=1; i<=rows(p); i++ ) {
                        for ( o=p[i,2]; o>=p[i,1]; o-- ) {
                            y = J(1,1,.)
                            X = J(1,5,.)
                            b = .
                            for    ( t=o; t>=p[i,1]; t-- ) {
                                if ( g[o,1] == g[t,1] & monthyear[o,1] - monthyear[t,1] <= 23 )  {    
                                y = y \ ret[t,1]
                                X = X \ (var1[t,1],var2[t,1],var3[t,1],var4[t,1],1)
                                }            
                            }
                            y=y[(2..rows(y)),.]
                            X=X[(2..rows(X)),.]        
                            if (rows(y)>=6)  {     //here you can modify the minimum number of cases to include in the rolling regrressions
                                b = invsym(X'X)*X'y    
                                alpha[o,1] = b[5,1]
                            }
                        }
                    }
                    end


                    The following code runs regression per UniqueID and stores the intercept in the corresponding UniqueID.

                    Code:
                    sort UniqueID monthyear
                    gen alpha = .
                    mata
                    mata clear
                    st_view(UniqueID =.,.,"UniqueID ")
                    st_view(monthyear=.,.,"monthyear")
                    st_view(ret=.,.,"ret")
                    st_view(risk_premium=.,.,"risk_premium")
                    st_view(alpha=.,.,"alpha")
                    p = panelsetup(UniqueID ,1)
                    for ( i=1; i<=rows(p); i++ ) {
                        b = .
                        y = ret[p[i,1]..p[i,2],1]
                        X = risk_premium[p[i,1]..p[i,2],1], J(rows(y),1,1)
                        if (rows(y)>=2)  {
                            b = invsym(X'X)*X'y    
                            alpha[p[i,1]..p[i,2],1] = b[2,1]:*J(rows(y),1,1)
                        }
                    }
                    end

                    Abraham

                    Comment


                    • #11

                      Victoria,

                      Please see Gould (2010). It gives advice on how to use Mata to make statistical calculations, p 129.

                      SJ-10-1 pr0050, Mata Matters: Stata in Mata.
                      W. Gould Q1/10 SJ 10(1):125--142

                      Abraham

                      Comment

                      Working...
                      X