Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mean vs. dtable

    Why do these two commands give slightly different output with the same source data?

    . svy: mean transpor housecost internet paybll12m fdsrunout care rx
    (running mean on estimation sample)

    Survey: Mean estimation

    Number of strata = 52 Number of obs = 27,974
    Number of PSUs = 662 Population size = 243,458,208
    Design df = 610

    --------------------------------------------------------------
    | Linearized
    | Mean std. err. [95% conf. interval]
    -------------+------------------------------------------------
    transpor | .0674449 .002632 .062276 .0726138
    housecost | .0743724 .0021366 .0701763 .0785685
    internet | .0571033 .0017042 .0537564 .0604502
    paybll12m | .1072457 .0023223 .1026851 .1118063
    fdsrunout | .1236142 .0030575 .1176097 .1296186
    care | .0829802 .0022991 .0784652 .0874952
    rx | .0766329 .0019758 .0727527 .0805131
    --------------------------------------------------------------

    . dtable transpor housecost internet paybll12m fdsrunout care rx, svy

    ---------------------------
    Summary
    ---------------------------
    N 258,237,552
    Transport 0.068 (0.251)
    Housing 0.075 (0.263)
    Internet 0.059 (0.235)
    Medical bills 0.107 (0.309)
    Food 0.124 (0.330)
    Medical care 0.083 (0.277)
    Medications 0.077 (0.267)
    ---------------------------


    Thanks!

    Ben

  • #2
    If you look at the reported N for each output, it looks like mean is using the subset in which all variables have observed data. Evidently it have some missing data. Try running your mean command as separate commands, one for each variable and compare the output.

    Comment


    • #3
      Yes, they seem to be using different sample sizes, even when I use a single variable and eliminate the survey weighting:
      Code:
      . mean housecost
      
      Mean estimation                         Number of obs = 28,066
      
      --------------------------------------------------------------
                   |       Mean   Std. err.     [95% conf. interval]
      -------------+------------------------------------------------
         housecost |   .0708687   .0015317      .0678664    .0738709
      --------------------------------------------------------------
      
      . dtable housecost
      
      ---------------------
                 Summary   
      ---------------------
      N              29,522
      Housing 0.071 (0.257)
      ---------------------
      (mean gets the same values as summarize)
      Code:
      . sum housecost
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
         housecost |     28,066    .0708687    .2566099          0          1

      Comment


      • #4
        I believe dtable by default reports the size of the entire sample, where as sum and mean report the number of observed/non-missing entries for the specific variable.

        Comment


        • #5
          It's not just the reported sample size. In the first example, the means for housing and internet are (slightly) different. It looks like it is using the overall sample size, not the variable-specific sample size.

          Comment


          • #6
            No, the -mean- command, with or without -svy:-, like all Stata estimation commands, uses only those observations that have no missing values on any of the variables mentioned in the command. By contrast, -dtable- uses a separate sample for each variable, consisting of all and only those observations which have non-missing values for that variable.

            Comment


            • #7
              Thanks, that helps!

              Comment

              Working...
              X