Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Basic question about calculation of Residual Sum of Squares with -reg- command

    Hello everyone,

    I have a really basic question about Stata calculations for the -reg- command. I have read the manual and searched on this forum, but I cannot seem to find an answer, and I'm hoping someone can provide a quick clarification for me. When I use -reg- for a simple regression, the reported Residual Sum of Squares (RSS) does not equal the same value as when I calculate this manually in Stata version 15.1. For example, I'll use the internal auto.dta dataset to show the issue I am having. I will do a simple regression with price as the outcome variable and mpg as the predictor variable. The calculated RSS shows up as 495615923.

    Code:
    sysuse auto
    reg price mpg
    But, when I calculate this manually using -predict-, I get a value of 495615916. I do this by using the -predict- command (with -xb- option) to generate predicted values of y (which I call yhat). I then generate a variable called residual that equals price (y) minus yhat. Then I generate residual_2, which is residual squared. Then, I use the -total- command to find the sum of all the squared residuals (and use -matrix results- and -display- with %11.0g formatting to display the value without scientific notation.)

    Code:
    reg price mpg
    predict yhat, xb
    gen residual = price - yhat
    gen residual_2 = residual^2
    
    total(residual_2)
    matrix results = e(b)
    di %11.0g results[1,1]
    I also calculate this using a third method, where I use -predict- with the option -residual-, and then use the same steps as the above manual calculation. With this method, I get another different value; this time it's 495615910.

    Code:
    reg price mpg
    predict resid, residuals
    gen resid_2 = resid^2
    
    total(resid_2)
    matrix results = e(b)
    di %11.0g results[1,1]
    Does anyone know why the two manual methods I use for calculating RSS does not match the value reported when I use the -reg- command? Which of the three values should I use reporting results? Last, does anyone know the code for how to manually calculate the value of RSS that will match the output value in the -reg- command?

    Thanks for everyone's help with this (hopefully straightforward) clarification.

    Best,
    Thomas
    Last edited by Thomas Robert; 04 Apr 2023, 15:35.

  • #2
    Your hand calculations are failing because of precision. When you use -gen- or -predict-, Stata's default is to create the variables as floats. But a float does not have enough bits to represent 9 significant figures in full precision. So the low order digits reflect rounding to the nearest possible number that will fit inside a float. The way to overcome this is to override the default by specifying double precision calculation:

    Code:
    . sysuse auto
    (1978 automobile data)
    
    . reg price mpg
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =     20.26
           Model |   139449474         1   139449474   Prob > F        =    0.0000
        Residual |   495615923        72  6883554.48   R-squared       =    0.2196
    -------------+----------------------------------   Adj R-squared   =    0.2087
           Total |   635065396        73  8699525.97   Root MSE        =    2623.7
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
           _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
    ------------------------------------------------------------------------------
    
    .
    . predict double yhat, xb
    
    .
    . gen double residual = price - yhat
    
    . gen double residual_2 = residual^2
    
    .
    . total(residual_2)
    
    Total estimation                            Number of obs = 74
    
    --------------------------------------------------------------
                 |      Total   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
      residual_2 |   4.96e+08   1.11e+08      2.75e+08    7.17e+08
    --------------------------------------------------------------
    
    . matrix results = e(b)
    
    . di %11.0g results[1,1]
      495615923
    
    .
    . reg price mpg
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =     20.26
           Model |   139449474         1   139449474   Prob > F        =    0.0000
        Residual |   495615923        72  6883554.48   R-squared       =    0.2196
    -------------+----------------------------------   Adj R-squared   =    0.2087
           Total |   635065396        73  8699525.97   Root MSE        =    2623.7
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
           _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
    ------------------------------------------------------------------------------
    
    . predict double resid, residuals
    
    . gen double resid_2 = resid^2
    
    .
    . total(resid_2)
    
    Total estimation                            Number of obs = 74
    
    --------------------------------------------------------------
                 |      Total   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
         resid_2 |   4.96e+08   1.11e+08      2.75e+08    7.17e+08
    --------------------------------------------------------------
    
    . matrix results = e(b)
    
    . di %11.0g results[1,1]
      495615923
    Added: Clarification. I misused language a little. Stata does all floating-point calculations in double precision. But by default it stores the results of calculations in (single precision) floats. By specifying -double- in the commands shown, the calculation is not changed, but by storing the results in double precision, the low order digits are preserved.
    Last edited by Clyde Schechter; 04 Apr 2023, 15:45.

    Comment


    • #3
      Hello Clyde,

      Thanks for the quick answer! This is exactly what I needed to solve the issue.

      Best,
      Thomas

      Comment

      Working...
      X