Stata version: 17
I ran the new table command with the statistic(semean varlist) option. The resulting mean standard errors were implausibly small, so I compared them to the output of mean, which gave very different mean standard errors. My dataset has survey weights, so I am running these commands with pweights. I am examining some variables which have missing values for most observations, due to a skip pattern.
Minimal working example:
Output:
If I remove either all occurrences [pw=wt] or set obs 20, the differences between the two commands disappear.
I suspect the difference is due to the observations with missing values being somehow taken int account when calculating the SEs by table but not by mean.
I looked in the help page for table and searched Statalist, but I could not find any documentation of this behavior. The PDF manual entry for table has a formal description of the calculation, and I suppose it is possible I am missing a subtle point in the notation that is different from the calculation in mean. I would be very grateful for an explanation. Is it possible it’s a bug?
I ran the new table command with the statistic(semean varlist) option. The resulting mean standard errors were implausibly small, so I compared them to the output of mean, which gave very different mean standard errors. My dataset has survey weights, so I am running these commands with pweights. I am examining some variables which have missing values for most observations, due to a skip pattern.
Minimal working example:
Code:
* create a dataset set obs 5 gen foo = _n set obs 20 gen wt = 100 * create a table and compare to *mean* table [pw=wt], statistic(mean foo) statistic(semean foo) mean foo [pw=wt]
Code:
. table [pw=wt], statistic(mean foo) statistic(semean foo) -------------------------------------- Mean | 3 Standard error of the mean | .1622214 -------------------------------------- . mean foo [pw=wt] Mean estimation Number of obs = 5 -------------------------------------------------------------- | Mean Std. err. [95% conf. interval] -------------+------------------------------------------------ foo | 3 .7071068 1.036757 4.963243 --------------------------------------------------------------
I suspect the difference is due to the observations with missing values being somehow taken int account when calculating the SEs by table but not by mean.
I looked in the help page for table and searched Statalist, but I could not find any documentation of this behavior. The PDF manual entry for table has a formal description of the calculation, and I suppose it is possible I am missing a subtle point in the notation that is different from the calculation in mean. I would be very grateful for an explanation. Is it possible it’s a bug?
Comment