Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is sum_of_weights not stored in e() after -regress-?

    One can run weighted regression usingf -regress- and, although the sum of the weights is echoed to output, it is not stored in e() -- see below. Anyone know why?
    [I work in a context where sometimes it is mandatory to report both the number of observations and the sum of the weights. Yes, I know I could -svyset- the data and use -svy: regress-, but I don't think I should have to do so for the descriptive exercises I am engaged in.]

    Code:
    . sysuse auto
    (1978 automobile data)
    
    . regress length mpg [w = weight]
    (analytic weights assumed)
    (sum of wgt is 223,440)
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =    129.22
           Model |  21959.6658         1  21959.6658   Prob > F        =    0.0000
        Residual |  12235.8855        72  169.942854   R-squared       =    0.6422
    -------------+----------------------------------   Adj R-squared   =    0.6372
           Total |  34195.5512        73  468.432209   Root MSE        =    13.036
    
    ------------------------------------------------------------------------------
          length | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -3.211719   .2825375   -11.37   0.000    -3.774947    -2.64849
           _cons |   257.8735   5.880868    43.85   0.000     246.1502    269.5968
    ------------------------------------------------------------------------------
    
    . eret list
    
    scalars:
                      e(N) =  74
                   e(df_m) =  1
                   e(df_r) =  72
                      e(F) =  129.2179416097085
                     e(r2) =  .6421790252697521
                   e(rmse) =  13.03621317225194
                    e(mss) =  21959.66576867037
                    e(rss) =  12235.88547881244
                   e(r2_a) =  .6372092895096098
                     e(ll) =  -293.9997918054931
                   e(ll_0) =  -332.0255238705334
                   e(rank) =  2
    
    macros:
                e(cmdline) : "regress length mpg [w = weight]"
                  e(title) : "Linear regression"
              e(marginsok) : "XB default"
                    e(vce) : "ols"
                 e(depvar) : "length"
                    e(cmd) : "regress"
             e(properties) : "b V"
                e(predict) : "regres_p"
                  e(model) : "ols"
              e(estat_cmd) : "regress_estat"
                   e(wexp) : "= weight"
                  e(wtype) : "aweight"
    
    matrices:
                      e(b) :  1 x 2
                      e(V) :  2 x 2
                   e(beta) :  1 x 1
    
    functions:
                 e(sample)

  • #2
    I do not know why the statistic is not stored, but as a workaround, you can append to the regression:

    Code:
    sysuse auto, clear
    regress length mpg [w = weight]
    qui sum `=substr("`e(wexp)'", 3, .)' if e(sample)
    di r(sum)
    Since summarize is not an estimation command, it does not overwrite the results in e(). Therefore, you can retrieve the statistic from r(sum).

    Res.:

    Code:
    . regress length mpg [w = weight]
    (analytic weights assumed)
    (sum of wgt is 223,440)
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =    129.22
           Model |  21959.6658         1  21959.6658   Prob > F        =    0.0000
        Residual |  12235.8855        72  169.942854   R-squared       =    0.6422
    -------------+----------------------------------   Adj R-squared   =    0.6372
           Total |  34195.5512        73  468.432209   Root MSE        =    13.036
    
    ------------------------------------------------------------------------------
          length | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -3.211719   .2825375   -11.37   0.000    -3.774947    -2.64849
           _cons |   257.8735   5.880868    43.85   0.000     246.1502    269.5968
    ------------------------------------------------------------------------------
    
    . 
    . qui sum `=substr("`e(wexp)'", 3, .)' if e(sample)
    
    . 
    . di r(sum)
    223440

    Comment


    • #3
      You could also create a simple regress.ado clone

      you would need something like this

      if "`exp'"!="" {
      tempvar v
      qui:gen `v'=1
      qui:sum `v' [w`exp']
      ereturn scalar sum_w = `r(sum_w)'
      }

      (perhaps around line 135)

      F

      Comment


      • #4
        Hi Stephen,

        I think Stata doesn't save the sum of the weights in e() because you are using aweights. Using "help weight" shows:

        For most Stata commands, the recorded scale of aweights is irrelevant; Stata internally rescales them to sum to N, the number of observations in your data, when it uses them.
        You can multiply aweights by any positive constant and it doesn't make any difference to the estimates of the coefficients or to the reported standard errors. So, the sum of the weights doesn't have any obvious interpretation and, in general, reporting their sum doesn't make sense. (There might be sensible reasons in any particular circumstance, but I think Stata is right not to save them automatically).

        (If you use fweights or iweights, Stata does save their sum in e(N). This is because multiplying fweights and iweights by a constant does make a difference to the reported standard errors. So how they are scaled is important.)

        Comment


        • #5
          Thanks, John. Good point. I was aware ot it ... however, I don't think it's the whole story. I didn't show my experiments using pweights, and one would expect the sum of the weights to be shown in that case. It isn't. And, as I said, the sum of the weights is echoed to the output, so I don't see why it can't simply be added to the list of scalars in e()

          Comment

          Working...
          X