Why is sum_of_weights not stored in e() after -regress-?

Stephen Jenkins

Join Date: Apr 2014
Posts: 1435

Why is sum_of_weights not stored in e() after -regress-?

18 Sep 2024, 08:49

One can run weighted regression usingf -regress- and, although the sum of the weights is echoed to output, it is not stored in e() -- see below. Anyone know why?
[I work in a context where sometimes it is mandatory to report both the number of observations and the sum of the weights. Yes, I know I could -svyset- the data and use -svy: regress-, but I don't think I should have to do so for the descriptive exercises I am engaged in.]

Code:

. sysuse auto
(1978 automobile data)

. regress length mpg [w = weight]
(analytic weights assumed)
(sum of wgt is 223,440)

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =    129.22
       Model |  21959.6658         1  21959.6658   Prob > F        =    0.0000
    Residual |  12235.8855        72  169.942854   R-squared       =    0.6422
-------------+----------------------------------   Adj R-squared   =    0.6372
       Total |  34195.5512        73  468.432209   Root MSE        =    13.036

------------------------------------------------------------------------------
      length | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -3.211719   .2825375   -11.37   0.000    -3.774947    -2.64849
       _cons |   257.8735   5.880868    43.85   0.000     246.1502    269.5968
------------------------------------------------------------------------------

. eret list

scalars:
                  e(N) =  74
               e(df_m) =  1
               e(df_r) =  72
                  e(F) =  129.2179416097085
                 e(r2) =  .6421790252697521
               e(rmse) =  13.03621317225194
                e(mss) =  21959.66576867037
                e(rss) =  12235.88547881244
               e(r2_a) =  .6372092895096098
                 e(ll) =  -293.9997918054931
               e(ll_0) =  -332.0255238705334
               e(rank) =  2

macros:
            e(cmdline) : "regress length mpg [w = weight]"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "length"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"
               e(wexp) : "= weight"
              e(wtype) : "aweight"

matrices:
                  e(b) :  1 x 2
                  e(V) :  2 x 2
               e(beta) :  1 x 1

functions:
             e(sample)

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

18 Sep 2024, 10:08

I do not know why the statistic is not stored, but as a workaround, you can append to the regression:

Code:

sysuse auto, clear
regress length mpg [w = weight]
qui sum `=substr("`e(wexp)'", 3, .)' if e(sample)
di r(sum)

Since summarize is not an estimation command, it does not overwrite the results in e(). Therefore, you can retrieve the statistic from r(sum).

Res.:

Code:

. regress length mpg [w = weight]
(analytic weights assumed)
(sum of wgt is 223,440)

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =    129.22
       Model |  21959.6658         1  21959.6658   Prob > F        =    0.0000
    Residual |  12235.8855        72  169.942854   R-squared       =    0.6422
-------------+----------------------------------   Adj R-squared   =    0.6372
       Total |  34195.5512        73  468.432209   Root MSE        =    13.036

------------------------------------------------------------------------------
      length | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -3.211719   .2825375   -11.37   0.000    -3.774947    -2.64849
       _cons |   257.8735   5.880868    43.85   0.000     246.1502    269.5968
------------------------------------------------------------------------------

. 
. qui sum `=substr("`e(wexp)'", 3, .)' if e(sample)

. 
. di r(sum)
223440

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2469
#3

19 Sep 2024, 08:11

You could also create a simple regress.ado clone

you would need something like this

if "`exp'"!="" {
tempvar v
qui:gen `v'=1
qui:sum `v' [w`exp']
ereturn scalar sum_w = `r(sum_w)'
}

(perhaps around line 135)

F
Comment
John DSouza

Join Date: Apr 2020

Posts: 7
#4

19 Sep 2024, 14:49

Hi Stephen,

I think Stata doesn't save the sum of the weights in e() because you are using aweights. Using "help weight" shows:

For most Stata commands, the recorded scale of aweights is irrelevant; Stata internally rescales them to sum to N, the number of observations in your data, when it uses them.

You can multiply aweights by any positive constant and it doesn't make any difference to the estimates of the coefficients or to the reported standard errors. So, the sum of the weights doesn't have any obvious interpretation and, in general, reporting their sum doesn't make sense. (There might be sensible reasons in any particular circumstance, but I think Stata is right not to save them automatically).

(If you use fweights or iweights, Stata does save their sum in e(N). This is because multiplying fweights and iweights by a constant does make a difference to the reported standard errors. So how they are scaled is important.)
2 likes
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#5

20 Sep 2024, 01:58

Thanks, John. Good point. I was aware ot it ... however, I don't think it's the whole story. I didn't show my experiments using pweights, and one would expect the sum of the weights to be shown in that case. It isn't. And, as I said, the sum of the weights is echoed to the output, so I don't see why it can't simply be added to the list of scalars in e()
Comment

Announcement

Why is sum_of_weights not stored in e() after -regress-?

Comment

Comment

Comment

Comment