Computation of coefficient standard errors using regress

Lyle Scruggs

Join Date: Jun 2014

Posts: 17
#1

Computation of coefficient standard errors using regress

03 Feb 2025, 15:50

Today I was trying to generate standard error estimates for regression coefficients. (This was just to walk some students through the minutiae.) But I did not reproduce Stata's reported estimates. I tried on a smaller dataset and had the same problem. My code and the (smaller) dataset are below. Differences are small-ish I know, but still. Did I miss something here?

motherinches studentinches ideology male momsed
62 62 6 0 3
63 66 7 0 3
65 75 6 1 3
64 66 4 0 3
68 69 3 1 3
65 73 4 1 3
62 64 3 0 1
65 66 6 0 1
62 65 3 0 1
66 71 6 1 1
67 69 3 1 1
61 62 3 1 1
64 62 2 0 4
64 62 7 0 4
64 73 3 1 4
63 62 3 0 4
68 73 6 1 4
62 67 3 1 4
66 68 1 1 4
63 72 7 1 4
62 70 4 1 4
60 62 2 0 4
66 74 7 1 4
65 62 2 0 4
62 65 2 0 4
68 62 4 0 2
66 68 5 1 2
68 62 3 0 2
61 62 5 0 2
60 62 5 0 2

Code and reported output
use "C:\Users\las02013\Dropbox\classes\stats2\height_m _sample.dta", replace
reg studentinches motherinches male momsed ideology

Source | SS df MS Number of obs = 30
-------------+---------------------------------- F(4, 25) = 18.05
Model | 422.99631 4 105.749077 Prob > F = 0.0000
Residual | 146.470357 25 5.85881427 R-squared = 0.7428
-------------+---------------------------------- Adj R-squared = 0.7016
Total | 569.466667 29 19.6367816 Root MSE = 2.4205

------------------------------------------------------------------------------
studentinc~s | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
motherinches | .2451402 .1982764 1.24 0.228 -.1632177 .6534982
male | 6.28862 .9532934 6.60 0.000 4.325276 8.251965
momsed | .4846678 .3805885 1.27 0.215 -.2991689 1.268505
ideology | .6433369 .2544839 2.53 0.018 .1192176 1.167456
_cons | 43.82337 12.70533 3.45 0.002 17.65625 69.9905
------------------------------------------------------------------------------

predict yhat
generate double error= studentinches- yhat

gen double sq_er= error^2
egen double sse=sum(sq_er)
gen double mse= sse/(e(N)-e(df_m)-1)

*1 get the mean of each X
egen double mean_mother=mean(motherinches)
for var ideology male momsed : egen double mean_X=mean(X)

*2.compute the sum of squared deviations of each X
egen double ssd_mother=sum((motheri-mean_mother)^2)
for var ideology male momsed: egen double ssd_X=sum((X-mean_X)^2)

*3. generate the se for each X in the model
gen double se_mother= (mse/ssd_mother)^.5
for var ideology male momsed: gen double se_X= (mse/ssd_X)^.5

list se* in 1

+-----------------------------------------------+
| se_mother se_ideology se_male se_momsed |
|-----------------------------------------------|
| .18571657 .25212609 .88581157 .37588515 |
+-----------------------------------------------+
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

03 Feb 2025, 17:24

Your method of calculation is not correct. It fails to account for correlations among the right hand side variables. If you apply your method to a regression with only a single X variable, you get the correct results, as shown here just for motherinches:

Code:

. regress studentinches motherinches

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =      5.09
       Model |  87.5259288         1  87.5259288   Prob > F        =    0.0321
    Residual |  481.940738        28  17.2121692   R-squared       =    0.1537
-------------+----------------------------------   Adj R-squared   =    0.1235
       Total |  569.466667        29  19.6367816   Root MSE        =    4.1488

------------------------------------------------------------------------------
studentinc~s | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
motherinches |   .7178179   .3183198     2.26   0.032     .0657692    1.369867
       _cons |   20.54513   20.40775     1.01   0.323    -21.25825    62.34852
------------------------------------------------------------------------------

.
. predict yhat
(option xb assumed; fitted values)

. generate double error= studentinches- yhat

.
. gen double sq_er= error^2

. egen double sse=sum(sq_er)

. gen double mse= sse/(e(N)-e(df_m)-1)

.
. egen double mean_mother=mean(motherinches)

.
. egen double ssd_mother=sum((motheri-mean_mother)^2)

.
. gen double se_mother= (mse/ssd_mother)^.5

.
. list se in 1

     +-----------+
     | se_mother |
     |-----------|
  1. | .31831985 |
     +-----------+

When you have multiple X variables, it is more complicated. And it involves inverting the covariance matrix, which is probably more complicated than you want to walk your students through. (Or, at least, I hope, for your students' sake, it is.)

Comment

Lyle Scruggs

Join Date: Jun 2014

Posts: 17
#3

04 Feb 2025, 08:57

Thanks. And no, we're not going into matrix algebra. (Though I guess I need a refresher.)
Comment

Announcement