Weighted tab and table results differ

Simon Brauer

Join Date: May 2017

Posts: 2
#1

Weighted tab and table results differ

19 May 2017, 15:01

I'm getting different point estimates when using tab and aweight instead of table and pweight. Everything I've seen (such as this discussion http://www.stata.com/statalist/archi.../msg00423.html) indicates that aweight and pweight produce the same point estimate but different variances.

I'm using the new 1972-2016 GSS data and a recoded White Baptist variable, but the difference is apparent in the standard race variable. I'm using wtssall. If you run

Code:

tab race [aweight=wtssall]

the white point estimate is 50,320.945. On the other hand, if you run

Code:

table race [pweight=wtssall]

the white point estimate is 50,321.7. Why is this occurring? And when are each appropriate?

Last edited by Simon Brauer; 19 May 2017, 15:04.
Tags: tab, table, weights
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

20 May 2017, 18:46

Welcome to Statalist, Simon.

The description of your problem is a little sparse, which may be why you have yet to receive any suggestions. You should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

In this case, both tab and table produce more than a single number of output, and it would be useful to have seen the entire output from each. Is every estimate lower in tab than in table? If so, then the next question is, have you reviewed the output of help weight? In particular it says

For most Stata commands, the recorded scale of aweights is irrelevant; Stata internally rescales them to sum to N, the number of observations in your data, when it uses them.

which suggests to me that your tab results reflect the weights, rescaled to a smaller total. By the way, that same output describes the general use of each of the weights.

I'm also led to ask if you have any missing values for race? That perhaps could affect your results.

And pushing back yet farther, have you compared the unweighted results from tab and table: are they identical?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#3

21 May 2017, 06:39

William gives good advice. In addition:

I'm getting different point estimates when using tab and aweight instead of table and pweight. Everything I've seen (such as this discussion http://www.stata.com/statalist/archi.../msg00423.html) indicates that aweight and pweight produce the same point estimate but different variances.

You talk about point estimates in the context of a regression. For example, running

Code:

sysuse auto reg price mpg [pw= weight] reg price mpg [aw= weight]

Will yield the same coefficients for _b[mpg] and _b[_cons] but different standard errors. Including the robust option with aweights should result in the same standard errors.

Code:

reg price mpg [aw= weight], robust

Running tab or table on the other hand is just gives a summary of the data. The difference between

the white point estimate is 50,320.945.

and

the white point estimate is 50,321.7.

is most likely due to rounding. You do not show exactly what you type and what Stata outputs, so we do not know where these numbers come from. The following example, however, illustrates how rounding can affect comparisons between the two

Code:

. webuse lbw (Hosmer & Lemeshow data) . tab race [aweight=age] race | Freq. Percent Cum. ------------+----------------------------------- white | 100.352459 53.10 53.10 black | 24.0983607 12.75 65.85 other | 64.5491803 34.15 100.00 ------------+----------------------------------- Total | 189 100.00 . table race [pweight=age] ---------------------- race | Freq. ----------+----------- white | 2,332 black | 560 other | 1,500 ---------------------- *// Compare the percentage freq.(aweights) to Freq. (pweights): White . di 0.531*(2332+ 560+1500 ) 2332.152 *// Compare Freq. (pweights) to percentage freq.(aweights): White . di 2332/ (2332+ 560+1500) .53096539

Last edited by Andrew Musau; 21 May 2017, 06:50.
1 like
Comment

Simon Brauer

Join Date: May 2017
Posts: 2

07 Jun 2017, 07:17

Thank you William and Andrew, both for the specific details addressing the questions and broader suggestions on how to post thorough questions on Statalist.

Following Andrew's suggestion, it does seem to be the result of rounding differences and possibly the rescaling of weights when using aweight (code below). The weights sum to .99 greater than N.

Code:

tab race [aweight=wtssall]

    race of |
 respondent |      Freq.     Percent        Cum.
------------+-----------------------------------
      white | 50,320.945       80.56       80.56
      black | 8,431.5265       13.50       94.06
      other | 3,713.5289        5.94      100.00
------------+-----------------------------------
      Total |     62,466      100.00


table race [pweight=wtssall]

----------------------
race of   |
responden |
t         |      Freq.
----------+-----------
    white |   50,321.7
    black |   8,431.66
    other |   3,713.59
----------------------

di 0.8056 * (50321.7 + 8431.66 + 3713.59)
50323.375

di 50321.7 / (50321.7 + 8431.66 + 3713.59)
.80557319



sum wtssall

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     wtssall |     62,466    1.000016    .4619267   .3918251   8.739876

display r(sum)
62466.99

Announcement

Weighted tab and table results differ

Comment

Comment

Comment