Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do ci and mean yield different confidence intervals?

    The commands ci and mean both compute mean values, standard errors and confidence intervals. However, when a variable is grouped, the confidence intervals are different.

    In the first example all results (mean, standard error, confidence interval) are the same.
    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . ci mpg
    
        Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
    -------------+---------------------------------------------------------------
             mpg |         74     21.2973    .6725511         19.9569    22.63769
    
    . mean mpg
    
    Mean estimation                   Number of obs   =         74
    
    --------------------------------------------------------------
                 |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
             mpg |    21.2973   .6725511       19.9569    22.63769
    --------------------------------------------------------------
    In the second example the mean values and standard errors are the same but the confidence intervals are different. How can this be explained?
    Code:
    . bysort foreign: ci mpg
    
    -----------------------------------------------------------------------------------
    -> foreign = Domestic
    
        Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
    -------------+---------------------------------------------------------------
             mpg |         52    19.82692     .657777        18.50638    21.14747
    
    -----------------------------------------------------------------------------------
    -> foreign = Foreign
    
        Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
    -------------+---------------------------------------------------------------
             mpg |         22    24.77273     1.40951        21.84149    27.70396
    
    . mean mpg, over(foreign)
    
    Mean estimation                   Number of obs   =         74
    
         Domestic: foreign = Domestic
          Foreign: foreign = Foreign
    
    --------------------------------------------------------------
            Over |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    mpg          |
        Domestic |   19.82692    .657777      18.51598    21.13787
         Foreign |   24.77273    1.40951      21.96358    27.58188
    --------------------------------------------------------------

  • #2
    Hazarding a guess, I believe it relates to the number of observations used in each command ("over", I guess, splits the estimation), due to the n being part of the formula to calculate the SE.

    Recently, when "summarizing" the same variables under wide and long format, I came across this and got quite intrigued.

    I also wish to have this issue clarified..

    Best,

    Marcos
    Last edited by Marcos Almeida; 11 Jun 2015, 08:56.
    Best regards,

    Marcos

    Comment


    • #3
      This is because

      Code:
      bys foreign : ci mpg
      is the same as

      Code:
      mean mpg if (foreign == 0)
      mean mpg if (foreign == 1)
      but not the same as

      Code:
      mean mpg , over(foreign)
      As Marco correctly points out, the latter treats domestic and foreign cars as coming from independent samples, as explained in [R] mean, which in this example indeed leads to different number of observations used in the calculation. However, it is not the calculation of standard errors that differs - as these are the same. It is the different degrees of freedom used in calculating the upper and lower bound of the CIs.

      This is easily demonstrated. Let us replicate the CI reported by

      Code:
      mean mpg if (foreign == 0)
      Code:
      tempname z
      qui su mpg if (foreign == 0)
      sca `z' = invttail(r(N) - 1, 0.025) // <- r(N) :== 22 !
      di "ul is: " r(mean) + `z' * sqrt(r(Var)/r(N))
      di "ll is: " r(mean) - `z' * sqrt(r(Var)/r(N))
      di _n  r(mean) - `z' * sqrt(r(Var)/r(N)) " ; " r(mean) + `z' * sqrt(r(Var)/r(N))
      Now let us replicate the CI reported for the sub-population of domestic cars, reported by

      Code:
      mean mpg , over(foreign)
      Code:
      tempname z
      qui su mpg if (foreign == 0)
      sca `z' = invttail(c(N) - 1, 0.025) // <- c(N) :== 74 !
      di "ul is: " r(mean) + `z' * sqrt(r(Var)/r(N))
      di "ll is: " r(mean) - `z' * sqrt(r(Var)/r(N))
      di _n  r(mean) - `z' * sqrt(r(Var)/r(N)) " ; " r(mean) + `z' * sqrt(r(Var)/r(N))
      Note that while I calculate the standard errors the same way in both examples, I do change the degrees of freedom in calculating z in the latter.

      I hope this helps.

      Best
      Daniel
      Last edited by daniel klein; 11 Jun 2015, 09:19. Reason: Added display CI in one line

      Comment


      • #4
        Lots of thanks, Daniel, for having shed light on this matter!

        Best,

        Marcos
        Best regards,

        Marcos

        Comment


        • #5
          Daniel, thank you for the good explanation.

          Comment

          Working...
          X