Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 95% confidence intervals around means

    Dear Stata

    I would like to calculate a 95% confidence interval around a mean of Diff_SR within each of three groups (Diff1three 0,1,2)

    Initially i did this using

    mean Diff_SR, over (Diff1three)

    --------------------------------------------------------------
    Over | Mean Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    Diff_SR |
    0 | .117838 .0842012 -.0499747 .2856507
    1 | .0026495 .0722347 -.141314 .1466129
    2 | .0601243 .056828 -.0531338 .1733823


    I then wanted to graph this so I exported the data to tableau, and realised it calculated a different confidence interval.

    I then went back and just looked at one group:

    mean Diff_SR if Diff1three == 0


    --------------------------------------------------------------
    | Mean Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    Diff_SR | .117838 .0842012 -.0627555 .2984315
    --------------------------------------------------------------

    this is different from my first result, and gives the same Confidence interval as tableau. Does anyone know why there is a discrepancy when using "over" in the first example, and which is best to use?

    Best Wishes

    Joe

  • #2
    Here's a detailed discussion on your question:

    https://www.statalist.org/forums/for...ence-intervals
    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

    Comment


    • #3
      Hmm. I couldn't work with your example directly because you don't give the N's. But playing around with the auto data set, it appears that when you run -mean whatever, over(grouping_variable)-, the confidence intervals for all levels of the grouping variable are calculated using a t-statistic that has df = total sample size in all groups combined - 1, whereas when you do -mean whatever if grouping_variable == some_value- the confidence interval is calculated using a t-statistic with df = # of obs where grouping_variable == some_value - 1. The latter is clearly correct. The former seems incorrect to me, unless using the degrees of freedom for the entire combined sample is some kind of correction for multiple confidence intervals, but I do not recall seeing that before.

      I'd be interested if someone from StataCorp would comment on this.

      Crossed with Eric Booth's response in #2. The link there does indeed explain it in depth. No need for any further comment from StataCorp.
      Last edited by Clyde Schechter; 05 Sep 2018, 12:37.

      Comment


      • #4
        Dear Clyde and Eric

        Thanks to both. Really useful as always!

        Best Wishes

        Joe

        Comment

        Working...
        X