Very different SEs with table vs mean when there are missing values and pweights

Katriel Friedman

Join Date: Nov 2014

Posts: 14
#1

Very different SEs with table vs mean when there are missing values and pweights

19 Nov 2021, 23:24

Stata version: 17

I ran the new table command with the statistic(semean varlist) option. The resulting mean standard errors were implausibly small, so I compared them to the output of mean, which gave very different mean standard errors. My dataset has survey weights, so I am running these commands with pweights. I am examining some variables which have missing values for most observations, due to a skip pattern.

Minimal working example:

Code:

* create a dataset set obs 5 gen foo = _n set obs 20 gen wt = 100 * create a table and compare to *mean* table [pw=wt], statistic(mean foo) statistic(semean foo) mean foo [pw=wt]

Output:

Code:

. table [pw=wt], statistic(mean foo) statistic(semean foo) -------------------------------------- Mean | 3 Standard error of the mean | .1622214 -------------------------------------- . mean foo [pw=wt] Mean estimation Number of obs = 5 -------------------------------------------------------------- | Mean Std. err. [95% conf. interval] -------------+------------------------------------------------ foo | 3 .7071068 1.036757 4.963243 --------------------------------------------------------------

If I remove either all occurrences [pw=wt] or set obs 20, the differences between the two commands disappear.

I suspect the difference is due to the observations with missing values being somehow taken int account when calculating the SEs by table but not by mean.

I looked in the help page for table and searched Statalist, but I could not find any documentation of this behavior. The PDF manual entry for table has a formal description of the calculation, and I suppose it is possible I am missing a subtle point in the notation that is different from the calculation in mean. I would be very grateful for an explanation. Is it possible it’s a bug?

Last edited by Katriel Friedman; 19 Nov 2021, 23:30.
Tags: None

1 like

William Lisowski

Join Date: Dec 2014
Posts: 10150

20 Nov 2021, 10:13

Thank you for the excellent reproducible example, which has been very helpful.

Below is an expansion of it to include demonstrations of the results when either weights or missing values are omitted.

I believe you have found an error in the calculation produced by the table command in the case with weights and missing values. I note that your example with 5 values and 15 missing values is off by a factor of sqrt(19), which seemed suggestive. I've fiddled with other numbers of missing observations, and the difference is always the factor of the square root of a suggestive number: sqrt(4.5) with 5 missing values, sqrt(546) with 100 missing values. Very peculiar, I can't quite deduce how the error is being introduced.

I encourage you to submit this problem to Stata Technical Services as described at https://www.stata.com/support/tech-support/ for their consideration. Giving them the URL of this discussion will save you some effort in submitting it.

Added in edit: A better way to think about the discrepancy is that the standard error of the mean should be the standard deviation divided by the square root of the number of observations. In the weighted case with 15 missing values, it has been divided by sqrt(95) rather than sqrt(5); with 5 missing values it is sqrt(22.5). I still don't see the path to that number, but it is the clearest way of thinking about the discrepancy.

Code:

* create a dataset
set obs 5
gen foo = _n
set obs 20
gen wt = 100
 
* demonstrate agreement when no observations are missing
table () (result) in 1/5 [pw=wt], ///
    statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
mean foo in 1/5 [pw=wt]

* demonstrate agreement when weighting is omitted
table () (result) in 1/5, ///
    statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
mean foo

* demonstrate difference of weighted calculation with missing values
table () (result) [pw=wt], ///
    statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
mean foo [pw=wt]

Code:

. * demonstrate agreement when no observations are missing
. table () (result) in 1/5 [pw=wt], ///
>     statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)

-------------------------------------------------------------------------------------
Number of non-missing values   Mean   Standard deviation   Standard error of the mean
-------------------------------------------------------------------------------------
                           5      3             1.581139                     .7071068
-------------------------------------------------------------------------------------

. mean foo in 1/5 [pw=wt]

Mean estimation                              Number of obs = 5

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
         foo |          3   .7071068      1.036757    4.963243
--------------------------------------------------------------

.
. * demonstrate agreement when weighting is omitted
. table () (result) in 1/5, ///
>     statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)

-------------------------------------------------------------------------------------
Number of non-missing values   Mean   Standard deviation   Standard error of the mean
-------------------------------------------------------------------------------------
                           5      3             1.581139                     .7071068
-------------------------------------------------------------------------------------

. mean foo

Mean estimation                              Number of obs = 5

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
         foo |          3   .7071068      1.036757    4.963243
--------------------------------------------------------------

.
. * demonstrate difference of weighted calculation with missing values
. table () (result) [pw=wt], ///
>     statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)

-------------------------------------------------------------------------------------
Number of non-missing values   Mean   Standard deviation   Standard error of the mean
-------------------------------------------------------------------------------------
                           5      3             1.581139                     .1622214
-------------------------------------------------------------------------------------

. mean foo [pw=wt]

Mean estimation                              Number of obs = 5

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
         foo |          3   .7071068      1.036757    4.963243
--------------------------------------------------------------

.

Last edited by William Lisowski; 20 Nov 2021, 10:27.

Comment

Katriel Friedman

Join Date: Nov 2014

Posts: 14
#3

20 Nov 2021, 13:00

Many thanks, William! I have written to Technical Support.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#4

20 Nov 2021, 13:21

I will add in tangent that I have found and reported a related bug in -tables- some weeks ago, which seemed to involved fweights and string variables. It does seem possible that there may be deeper issues with how table accepts weights in general.
1 like
Comment
Katriel Friedman

Join Date: Nov 2014

Posts: 14
#5

21 Dec 2021, 19:10

For the record, it looks like this is fixed in the 16dec2021 update.
1 like
Comment

Announcement