Why do ci and mean yield different confidence intervals?

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

Why do ci and mean yield different confidence intervals?

11 Jun 2015, 07:49

The commands ci and mean both compute mean values, standard errors and confidence intervals. However, when a variable is grouped, the confidence intervals are different.

In the first example all results (mean, standard error, confidence interval) are the same.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. ci mpg

    Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
         mpg |         74     21.2973    .6725511         19.9569    22.63769

. mean mpg

Mean estimation                   Number of obs   =         74

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         mpg |    21.2973   .6725511       19.9569    22.63769
--------------------------------------------------------------

In the second example the mean values and standard errors are the same but the confidence intervals are different. How can this be explained?

Code:

. bysort foreign: ci mpg

-----------------------------------------------------------------------------------
-> foreign = Domestic

    Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
         mpg |         52    19.82692     .657777        18.50638    21.14747

-----------------------------------------------------------------------------------
-> foreign = Foreign

    Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
         mpg |         22    24.77273     1.40951        21.84149    27.70396

. mean mpg, over(foreign)

Mean estimation                   Number of obs   =         74

     Domestic: foreign = Domestic
      Foreign: foreign = Foreign

--------------------------------------------------------------
        Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
mpg          |
    Domestic |   19.82692    .657777      18.51598    21.13787
     Foreign |   24.77273    1.40951      21.96358    27.58188
--------------------------------------------------------------

Tags: None

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

11 Jun 2015, 08:47

Hazarding a guess, I believe it relates to the number of observations used in each command ("over", I guess, splits the estimation), due to the n being part of the formula to calculate the SE.

Recently, when "summarizing" the same variables under wide and long format, I came across this and got quite intrigued.

I also wish to have this issue clarified..

Best,

Marcos

Last edited by Marcos Almeida; 11 Jun 2015, 08:56.

Best regards,

Marcos
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#3

11 Jun 2015, 09:13

This is because

Code:

bys foreign : ci mpg

is the same as

Code:

mean mpg if (foreign == 0) mean mpg if (foreign == 1)

but not the same as

Code:

mean mpg , over(foreign)

As Marco correctly points out, the latter treats domestic and foreign cars as coming from independent samples, as explained in [R] mean, which in this example indeed leads to different number of observations used in the calculation. However, it is not the calculation of standard errors that differs - as these are the same. It is the different degrees of freedom used in calculating the upper and lower bound of the CIs.

This is easily demonstrated. Let us replicate the CI reported by

Code:

mean mpg if (foreign == 0)

Code:

tempname z qui su mpg if (foreign == 0) sca `z' = invttail(r(N) - 1, 0.025) // <- r(N) :== 22 ! di "ul is: " r(mean) + `z' * sqrt(r(Var)/r(N)) di "ll is: " r(mean) - `z' * sqrt(r(Var)/r(N)) di _n r(mean) - `z' * sqrt(r(Var)/r(N)) " ; " r(mean) + `z' * sqrt(r(Var)/r(N))

Now let us replicate the CI reported for the sub-population of domestic cars, reported by

Code:

mean mpg , over(foreign)

Code:

tempname z qui su mpg if (foreign == 0) sca `z' = invttail(c(N) - 1, 0.025) // <- c(N) :== 74 ! di "ul is: " r(mean) + `z' * sqrt(r(Var)/r(N)) di "ll is: " r(mean) - `z' * sqrt(r(Var)/r(N)) di _n r(mean) - `z' * sqrt(r(Var)/r(N)) " ; " r(mean) + `z' * sqrt(r(Var)/r(N))

Note that while I calculate the standard errors the same way in both examples, I do change the degrees of freedom in calculating z in the latter.

I hope this helps.

Best
Daniel

Last edited by daniel klein; 11 Jun 2015, 09:19. Reason: Added display CI in one line
2 likes
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

11 Jun 2015, 15:02

Lots of thanks, Daniel, for having shed light on this matter!

Best,

Marcos

Best regards,

Marcos
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#5

12 Jun 2015, 12:54

Daniel, thank you for the good explanation.
Comment

Announcement

Why do ci and mean yield different confidence intervals?

Comment

Comment

Comment

Comment