Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression: Variance of the error term

    Dear all,

    I have a dataset containing roughly 200 companies with daily stock data for 10 years.
    The variables are: date, companyid, Ri_Rft, B_Ret, SMB, HML
    I need to run reg Ri_Rft B_Ret SMB HML for every company in the sample monthly.
    After this I need to save the Variance of the Error Term as a new variable.

    I have a the following code set up:
    Code:
    gen resid=.
    levelsof id, local(groups)
    foreach a of local groups {
        quietly reg Ri_Rft B_Ret SMB HML if id==`a'
        tempvar d
        predict `d', stdp
        replace resid=`d' if id==`a'
    }
    However, I have two problems with this setup.
    First, I am not sure if the "predict, stdp" command achieves my goal of saving the variance of the error term.
    Will the new variable 'resid' contain the variance of the error term?
    Second, this code only works if I reduce my sample to roughly half the companies, or else it gives an error: no room to add more variables.
    Is this solved simply by using set maxvar and how does this work? Where should I place it in my code?

    Kind Regards,
    Bram van Vorstenbosch

  • #2
    -predict, stdp- will give you the standard error of prediction for each observation, not the variance of the error term. To get that, predict the residuals, and then use -summarize- to get its variance. As for losing room to add more variables, the problem is that you create a new tempvar each time you go through the loop, and then they just pile up and pile up.

    So try something like this:

    Code:
    gen error_variance = .
    levelsof id, local(groups)
    foreach a of local groups {
        quietly reg Ri_Rft B_Ret SMB HML if id == `a'
        predict resid if id == `a', resid
        quietly summ resid, detail
        replace error_variance = r(Var) if id == `a'
        drop resid
    }

    Comment


    • #3
      If all you need is var(error), rather than all the individual errors, it would make sense to just post the RMSE to a new file. That wouldn't eat up all you memory.
      Doug Hemken
      SSCC, Univ. of Wisc.-Madison

      Comment


      • #4
        No loops necessary:

        Code:
        // open example data
        sysuse nlsw88, clear
        
        // compute the standard deviation of the error for each occupation
        statsby rmse=e(rmse) , by(occupation) clear : reg wage ttl_exp grade i.race
        
        //admire the result
        list
        (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq )

        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thank you for the responses.
          Code:
           
           levelsof id, local(groups)
          Using this gave me a macro lenght exceeded error. But the second approach did work.

          Comment


          • #6
            Using this gave me a macro lenght exceeded error. But the second approach did work.
            Really? Even if you are using small Stata, a macro can hold 51,800 characters (-help limits-). If you are using IC, it's 264,392, and for SE/MP it's over 4,000,000. You said in your original post you had about 200 companies. Even allowing for some of the characters in the macro to be taken up by spaces or quotation marks, that should still leave well over 200 characters per firm name on average in small Stata, and you can't even begin to approach the limits of IC or SE/MP.

            I'm glad you were able to solve you problem without this anyway, but I'm mystified that you encountered this error message and wondering how it is possible.

            Comment

            Working...
            X