Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quick question about -mean- command confidence intervals

    Hello everyone,

    I have a quick (hopefully easy) question. When I use the -mean- command in Stata, it produces confidence intervals that do not match my results when I calculate these manually. Below is an example. I will use the auto.dta file in Stata so my results can be replicated.

    Code:
    sysuse auto.dta
    I will use the variable "price," and using the -mean- command, I calculate 99% confidence intervals. The results are, N = 74, Mean = 6165.257, Std. Err. = 342.8719, Lower = 5258.405, Upper = 7072.108

    Code:
     mean price, level(99)
    To calculate this manually, I save the results of the mean and the std. err. as macros that I call "mean" and "std_err":

    Code:
    matrix list e(b)
    mat b = e(b)
    global mean = b[1,1]
    dis $mean
    Stata calculates the standard error of the mean as the square root of the variance (pg.6, https://www.stata.com/manuals13/rmean.pdf). Thus, I use the saved variance from the -mean- command stored in vector V.

    Code:
    matrix list e(V)
    mat V = e(V)
    global variance = V[1,1]
    global std_err = sqrt($variance)
    dis $std_err
    Now that I have the mean and std. err. saved as precise values from the command, I use the confidence interval calculation: mean +/- (t-ratio * standard error). I looked up the t-ratio for 74 degrees of freedom, and found 2.644. I save the results in global macros called "lower_ci" and "upper_ci" for the lower and upper bounds.

    Code:
    global lower_ci = $mean - (2.644 * $std_err)
    dis $lower_ci
    global upper_ci = $mean + (2.644 * $std_err)
    dis $upper_ci
    As my manual results show, I get a lower CI of 5258.7034, but this is different from the lower CI reported using the -mean- command, which is 5258.405. Likewise, my manual result for the upper CI is 7071.8101, but the one reported using -mean- is 7072.108. The results are close, but not exact. Does anyone know why this is? Additionally, does anyone know how I can use my manual method to get exact results to match the -mean- command in Stata?

    Thanks!

  • #2
    Rounding error. The t-ratio you are using is 2.644, which is a rounded value of the value that Stata is using: 2.6448688. (In fact, it is actually incorrectly rounded: it should be 2.645.) If you use the unrounded value, you will get the same results that -mean- shows you. (You can find the t-ratio Stata uses in r(table)["crit", "price"].)

    Comment


    • #3
      Hi Clyde,

      Thanks so much for the quick response! I had a feeling it was due to rounding.

      Another quick follow up, when I look at r(table), it shows degrees of freedom as 73, not 74 (the total number of observations). When finding a critical value, are we supposed to use n-1 for the degrees of freedom?

      Thanks!

      Comment


      • #4
        Yes, you should use n-1 degrees of freedom. Whenever a variance is calculated on n observations and the mean is itself calculated from those n observations, you lose 1 df for calculating that mean, hence n-1. If the variance were calculated using a "known" mean exogenously obtained, then you would use n df.

        In the present situation, it makes little difference because the critical two-sided value for a 1% critical region is nearly the same for 73 or 74 df.

        Comment


        • #5
          Hi Clyde,

          That is really helpful, thank you! I really appreciate your quick responses.

          Thanks,
          Thomas

          Comment

          Working...
          X