Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very different SEs with table vs mean when there are missing values and pweights

    Stata version: 17

    I ran the new table command with the statistic(semean varlist) option. The resulting mean standard errors were implausibly small, so I compared them to the output of mean, which gave very different mean standard errors. My dataset has survey weights, so I am running these commands with pweights. I am examining some variables which have missing values for most observations, due to a skip pattern.

    Minimal working example:
    Code:
    * create a dataset
    set obs 5
    gen foo = _n
    set obs 20
    gen wt = 100
    
    * create a table and compare to *mean*
    table [pw=wt], statistic(mean foo) statistic(semean foo)
    mean foo [pw=wt]
    Output:
    Code:
    . table [pw=wt], statistic(mean foo) statistic(semean foo) 
    
    --------------------------------------
    Mean                       |         3
    Standard error of the mean |  .1622214
    --------------------------------------
    
    . mean foo [pw=wt]
    
    Mean estimation                              Number of obs = 5
    
    --------------------------------------------------------------
                 |       Mean   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
             foo |          3   .7071068      1.036757    4.963243
    --------------------------------------------------------------
    If I remove either all occurrences [pw=wt] or set obs 20, the differences between the two commands disappear.

    I suspect the difference is due to the observations with missing values being somehow taken int account when calculating the SEs by table but not by mean.

    I looked in the help page for table and searched Statalist, but I could not find any documentation of this behavior. The PDF manual entry for table has a formal description of the calculation, and I suppose it is possible I am missing a subtle point in the notation that is different from the calculation in mean. I would be very grateful for an explanation. Is it possible it’s a bug?
    Last edited by Katriel Friedman; 19 Nov 2021, 23:30.

  • #2
    Thank you for the excellent reproducible example, which has been very helpful.

    Below is an expansion of it to include demonstrations of the results when either weights or missing values are omitted.

    I believe you have found an error in the calculation produced by the table command in the case with weights and missing values. I note that your example with 5 values and 15 missing values is off by a factor of sqrt(19), which seemed suggestive. I've fiddled with other numbers of missing observations, and the difference is always the factor of the square root of a suggestive number: sqrt(4.5) with 5 missing values, sqrt(546) with 100 missing values. Very peculiar, I can't quite deduce how the error is being introduced.

    I encourage you to submit this problem to Stata Technical Services as described at https://www.stata.com/support/tech-support/ for their consideration. Giving them the URL of this discussion will save you some effort in submitting it.

    Added in edit: A better way to think about the discrepancy is that the standard error of the mean should be the standard deviation divided by the square root of the number of observations. In the weighted case with 15 missing values, it has been divided by sqrt(95) rather than sqrt(5); with 5 missing values it is sqrt(22.5). I still don't see the path to that number, but it is the clearest way of thinking about the discrepancy.

    Code:
    * create a dataset
    set obs 5
    gen foo = _n
    set obs 20
    gen wt = 100
     
    * demonstrate agreement when no observations are missing
    table () (result) in 1/5 [pw=wt], ///
        statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
    mean foo in 1/5 [pw=wt]
    
    * demonstrate agreement when weighting is omitted
    table () (result) in 1/5, ///
        statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
    mean foo
    
    * demonstrate difference of weighted calculation with missing values
    table () (result) [pw=wt], ///
        statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
    mean foo [pw=wt]
    Code:
    . * demonstrate agreement when no observations are missing
    . table () (result) in 1/5 [pw=wt], ///
    >     statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
    
    -------------------------------------------------------------------------------------
    Number of non-missing values   Mean   Standard deviation   Standard error of the mean
    -------------------------------------------------------------------------------------
                               5      3             1.581139                     .7071068
    -------------------------------------------------------------------------------------
    
    . mean foo in 1/5 [pw=wt]
    
    Mean estimation                              Number of obs = 5
    
    --------------------------------------------------------------
                 |       Mean   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
             foo |          3   .7071068      1.036757    4.963243
    --------------------------------------------------------------
    
    .
    . * demonstrate agreement when weighting is omitted
    . table () (result) in 1/5, ///
    >     statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
    
    -------------------------------------------------------------------------------------
    Number of non-missing values   Mean   Standard deviation   Standard error of the mean
    -------------------------------------------------------------------------------------
                               5      3             1.581139                     .7071068
    -------------------------------------------------------------------------------------
    
    . mean foo
    
    Mean estimation                              Number of obs = 5
    
    --------------------------------------------------------------
                 |       Mean   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
             foo |          3   .7071068      1.036757    4.963243
    --------------------------------------------------------------
    
    .
    . * demonstrate difference of weighted calculation with missing values
    . table () (result) [pw=wt], ///
    >     statistic (count foo) statistic(mean foo) statistic(sd foo) statistic(semean foo)
    
    -------------------------------------------------------------------------------------
    Number of non-missing values   Mean   Standard deviation   Standard error of the mean
    -------------------------------------------------------------------------------------
                               5      3             1.581139                     .1622214
    -------------------------------------------------------------------------------------
    
    . mean foo [pw=wt]
    
    Mean estimation                              Number of obs = 5
    
    --------------------------------------------------------------
                 |       Mean   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
             foo |          3   .7071068      1.036757    4.963243
    --------------------------------------------------------------
    
    .
    Last edited by William Lisowski; 20 Nov 2021, 10:27.

    Comment


    • #3
      Many thanks, William! I have written to Technical Support.

      Comment


      • #4
        I will add in tangent that I have found and reported a related bug in -tables- some weeks ago, which seemed to involved fweights and string variables. It does seem possible that there may be deeper issues with how table accepts weights in general.

        Comment


        • #5
          For the record, it looks like this is fixed in the 16dec2021 update.

          Comment

          Working...
          X