95% confidence intervals around means

Joe Ward

Join Date: Jun 2015

Posts: 45
#1

95% confidence intervals around means

05 Sep 2018, 08:19

Dear Stata

I would like to calculate a 95% confidence interval around a mean of Diff_SR within each of three groups (Diff1three 0,1,2)

Initially i did this using

mean Diff_SR, over (Diff1three)

--------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Diff_SR |
0 | .117838 .0842012 -.0499747 .2856507
1 | .0026495 .0722347 -.141314 .1466129
2 | .0601243 .056828 -.0531338 .1733823

I then wanted to graph this so I exported the data to tableau, and realised it calculated a different confidence interval.

I then went back and just looked at one group:

mean Diff_SR if Diff1three == 0

--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Diff_SR | .117838 .0842012 -.0627555 .2984315
--------------------------------------------------------------

this is different from my first result, and gives the same Confidence interval as tableau. Does anyone know why there is a discrepancy when using "over" in the first example, and which is best to use?

Best Wishes

Joe
Tags: None
eric_a_booth

Join Date: Apr 2014

Posts: 288
#2

05 Sep 2018, 12:33

Here's a detailed discussion on your question:

https://www.statalist.org/forums/for...ence-intervals

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#3

05 Sep 2018, 12:33

Hmm. I couldn't work with your example directly because you don't give the N's. But playing around with the auto data set, it appears that when you run -mean whatever, over(grouping_variable)-, the confidence intervals for all levels of the grouping variable are calculated using a t-statistic that has df = total sample size in all groups combined - 1, whereas when you do -mean whatever if grouping_variable == some_value- the confidence interval is calculated using a t-statistic with df = # of obs where grouping_variable == some_value - 1. The latter is clearly correct. The former seems incorrect to me, unless using the degrees of freedom for the entire combined sample is some kind of correction for multiple confidence intervals, but I do not recall seeing that before.

I'd be interested if someone from StataCorp would comment on this.

Crossed with Eric Booth's response in #2. The link there does indeed explain it in depth. No need for any further comment from StataCorp.

Last edited by Clyde Schechter; 05 Sep 2018, 12:37.
Comment
Joe Ward

Join Date: Jun 2015

Posts: 45
#4

13 Sep 2018, 02:18

Dear Clyde and Eric

Thanks to both. Really useful as always!

Best Wishes

Joe
Comment

Announcement

95% confidence intervals around means

Comment

Comment

Comment