mean vs. dtable

Ben Littenberg

Join Date: Apr 2014

Posts: 13
#1

mean vs. dtable

03 Aug 2024, 14:07

Why do these two commands give slightly different output with the same source data?

. svy: mean transpor housecost internet paybll12m fdsrunout care rx
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 52 Number of obs = 27,974
Number of PSUs = 662 Population size = 243,458,208
Design df = 610

--------------------------------------------------------------
| Linearized
| Mean std. err. [95% conf. interval]
-------------+------------------------------------------------
transpor | .0674449 .002632 .062276 .0726138
housecost | .0743724 .0021366 .0701763 .0785685
internet | .0571033 .0017042 .0537564 .0604502
paybll12m | .1072457 .0023223 .1026851 .1118063
fdsrunout | .1236142 .0030575 .1176097 .1296186
care | .0829802 .0022991 .0784652 .0874952
rx | .0766329 .0019758 .0727527 .0805131
--------------------------------------------------------------

. dtable transpor housecost internet paybll12m fdsrunout care rx, svy

---------------------------
Summary
---------------------------
N 258,237,552
Transport 0.068 (0.251)
Housing 0.075 (0.263)
Internet 0.059 (0.235)
Medical bills 0.107 (0.309)
Food 0.124 (0.330)
Medical care 0.083 (0.277)
Medications 0.077 (0.267)
---------------------------

Thanks!

Ben
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#2

03 Aug 2024, 14:11

If you look at the reported N for each output, it looks like mean is using the subset in which all variables have observed data. Evidently it have some missing data. Try running your mean command as separate commands, one for each variable and compare the output.
Comment

Ben Littenberg

Join Date: Apr 2014
Posts: 13

03 Aug 2024, 14:26

Yes, they seem to be using different sample sizes, even when I use a single variable and eliminate the survey weighting:

Code:

. mean housecost

Mean estimation                         Number of obs = 28,066

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
   housecost |   .0708687   .0015317      .0678664    .0738709
--------------------------------------------------------------

. dtable housecost

---------------------
           Summary   
---------------------
N              29,522
Housing 0.071 (0.257)
---------------------

(mean gets the same values as summarize)

Code:

. sum housecost

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   housecost |     28,066    .0708687    .2566099          0          1

Comment

Erik Reinbergs

Join Date: Oct 2022

Posts: 33
#4

03 Aug 2024, 14:56

I believe dtable by default reports the size of the entire sample, where as sum and mean report the number of observed/non-missing entries for the specific variable.
Comment
Ben Littenberg

Join Date: Apr 2014

Posts: 13
#5

03 Aug 2024, 15:37

It's not just the reported sample size. In the first example, the means for housing and internet are (slightly) different. It looks like it is using the overall sample size, not the variable-specific sample size.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

03 Aug 2024, 16:11

No, the -mean- command, with or without -svy:-, like all Stata estimation commands, uses only those observations that have no missing values on any of the variables mentioned in the command. By contrast, -dtable- uses a separate sample for each variable, consisting of all and only those observations which have non-missing values for that variable.
2 likes
Comment
Ben Littenberg

Join Date: Apr 2014

Posts: 13
#7

04 Aug 2024, 06:06

Thanks, that helps!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment