Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computation of coefficient standard errors using regress

    Today I was trying to generate standard error estimates for regression coefficients. (This was just to walk some students through the minutiae.) But I did not reproduce Stata's reported estimates. I tried on a smaller dataset and had the same problem. My code and the (smaller) dataset are below. Differences are small-ish I know, but still. Did I miss something here?

    motherinches studentinches ideology male momsed
    62 62 6 0 3
    63 66 7 0 3
    65 75 6 1 3
    64 66 4 0 3
    68 69 3 1 3
    65 73 4 1 3
    62 64 3 0 1
    65 66 6 0 1
    62 65 3 0 1
    66 71 6 1 1
    67 69 3 1 1
    61 62 3 1 1
    64 62 2 0 4
    64 62 7 0 4
    64 73 3 1 4
    63 62 3 0 4
    68 73 6 1 4
    62 67 3 1 4
    66 68 1 1 4
    63 72 7 1 4
    62 70 4 1 4
    60 62 2 0 4
    66 74 7 1 4
    65 62 2 0 4
    62 65 2 0 4
    68 62 4 0 2
    66 68 5 1 2
    68 62 3 0 2
    61 62 5 0 2
    60 62 5 0 2


    Code and reported output
    use "C:\Users\las02013\Dropbox\classes\stats2\height_m _sample.dta", replace
    reg studentinches motherinches male momsed ideology



    Source | SS df MS Number of obs = 30
    -------------+---------------------------------- F(4, 25) = 18.05
    Model | 422.99631 4 105.749077 Prob > F = 0.0000
    Residual | 146.470357 25 5.85881427 R-squared = 0.7428
    -------------+---------------------------------- Adj R-squared = 0.7016
    Total | 569.466667 29 19.6367816 Root MSE = 2.4205

    ------------------------------------------------------------------------------
    studentinc~s | Coefficient Std. err. t P>|t| [95% conf. interval]
    -------------+----------------------------------------------------------------
    motherinches | .2451402 .1982764 1.24 0.228 -.1632177 .6534982
    male | 6.28862 .9532934 6.60 0.000 4.325276 8.251965
    momsed | .4846678 .3805885 1.27 0.215 -.2991689 1.268505
    ideology | .6433369 .2544839 2.53 0.018 .1192176 1.167456
    _cons | 43.82337 12.70533 3.45 0.002 17.65625 69.9905
    ------------------------------------------------------------------------------



    predict yhat
    generate double error= studentinches- yhat

    gen double sq_er= error^2
    egen double sse=sum(sq_er)
    gen double mse= sse/(e(N)-e(df_m)-1)

    *1 get the mean of each X
    egen double mean_mother=mean(motherinches)
    for var ideology male momsed : egen double mean_X=mean(X)

    *2.compute the sum of squared deviations of each X
    egen double ssd_mother=sum((motheri-mean_mother)^2)
    for var ideology male momsed: egen double ssd_X=sum((X-mean_X)^2)

    *3. generate the se for each X in the model
    gen double se_mother= (mse/ssd_mother)^.5
    for var ideology male momsed: gen double se_X= (mse/ssd_X)^.5

    list se* in 1

    +-----------------------------------------------+
    | se_mother se_ideology se_male se_momsed |
    |-----------------------------------------------|
    | .18571657 .25212609 .88581157 .37588515 |
    +-----------------------------------------------+


  • #2
    Your method of calculation is not correct. It fails to account for correlations among the right hand side variables. If you apply your method to a regression with only a single X variable, you get the correct results, as shown here just for motherinches:
    Code:
    . regress studentinches motherinches
    
          Source |       SS           df       MS      Number of obs   =        30
    -------------+----------------------------------   F(1, 28)        =      5.09
           Model |  87.5259288         1  87.5259288   Prob > F        =    0.0321
        Residual |  481.940738        28  17.2121692   R-squared       =    0.1537
    -------------+----------------------------------   Adj R-squared   =    0.1235
           Total |  569.466667        29  19.6367816   Root MSE        =    4.1488
    
    ------------------------------------------------------------------------------
    studentinc~s | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    motherinches |   .7178179   .3183198     2.26   0.032     .0657692    1.369867
           _cons |   20.54513   20.40775     1.01   0.323    -21.25825    62.34852
    ------------------------------------------------------------------------------
    
    .
    . predict yhat
    (option xb assumed; fitted values)
    
    . generate double error= studentinches- yhat
    
    .
    . gen double sq_er= error^2
    
    . egen double sse=sum(sq_er)
    
    . gen double mse= sse/(e(N)-e(df_m)-1)
    
    .
    . egen double mean_mother=mean(motherinches)
    
    .
    . egen double ssd_mother=sum((motheri-mean_mother)^2)
    
    .
    . gen double se_mother= (mse/ssd_mother)^.5
    
    .
    . list se in 1
    
         +-----------+
         | se_mother |
         |-----------|
      1. | .31831985 |
         +-----------+
    When you have multiple X variables, it is more complicated. And it involves inverting the covariance matrix, which is probably more complicated than you want to walk your students through. (Or, at least, I hope, for your students' sake, it is.)

    Comment


    • #3
      Thanks. And no, we're not going into matrix algebra. (Though I guess I need a refresher.)

      Comment

      Working...
      X