Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard errors and 95% Confidence Intervals for Proportions - Differences between different Stata versions

    Dear community:

    I am having doubts about how Stata calculates standard errors and 95% confidence intervals for proportions, especially because I get different results from different versions of Stata. Here is my code example:

    Code:
    clear 
    input x freq
    0 47
    1 53
    end
    expand freq
    drop freq
    
    proportion x
    When I run the code above in Stata 13.1, I get a Std. Err. of .0501614 and a 95% Conf. Interval that goes from .4305964 to .6270792

    When I run the same code in Stata 15.1, I get a Std. Err. of .0499099 and a 95% Conf. Interval that goes from .4310876 to .6266107

    I have two questions:

    1) Why are the standard errors different?

    2) I have tried to "manually" calculate the 95% confidence intervals with the formula: CI(lower) = prop-(1.96*std. err.) ; CI(upper) = prop+(1.96*std. err.), but in both cases I donĀ“t get the same results as those provided by Stata.

    Am I doing anything wrong? Any advice and/or clarification is greatly appreciated.

    Best regards,

    Paolo Moncagatta

  • #2
    What's the method in each case? It might have changed between versions, for example, from Wald (normal approximation) to exact or something. Currently, it's from the logit transformation. You might try using version control from the Release 15.1 installation
    Code:
    version 13.1: proportion x
    and see whether that fixes things up.

    If not, then try something along the lines of the following to investigate further.
    Code:
    version 15.1
    
    clear *
    
    input byte(x freq)
    0 47
    1 53
    end
    
    /* expand freq
    drop freq */
    
    proportion x [fweight=freq]
    
    proportion x [fweight=freq], citype(wald)
    
    exit

    Comment


    • #3
      There are two courses of action in a situation like this:

      1. We either do not worry about small difference and trust that Stata Corp did the right thing. These are probably two asymptotically equivalent version of the procedure, note that the word Logit appears in Version 15, and the word Logit does not appear in Version 11 (I do not have Version 13 installed, but the rest of the numbers you mention for Version 13 check in Version 11).

      2. Or if we do worry, we need to read the manual Methods and Formulas, and see for ourselves where the difference comes from. I cannot do this for you because I have the manual of Stata 15 only.

      My guess is that as Joseph said, some asymptotic procedure in Version 13 changed to another Logit based asymptotic procedure in Stata 15.

      Comment


      • #4
        In version 13, it seems that Stata took the approach (manual is available online) that a proportion is just a mean of a 0/1 indicator. This meant using -mean- to compute the SEM, which uses an n-1 df adjustment, instead of n, with the usual computation of variance for a proportion. This appears to have changed in version 14 (see -help whatsnew14- update 29oct2015).

        Code:
        . di sqrt(0.53*(1-0.53)/(100-1))
        .05016136
        
        . di sqrt(0.53*(1-0.53)/(100))
        .04990992
        Then, the CI are computed using the logit-transformation using the plugin estimate for variance as above.

        Code:
        . di invlogit(logit(0.53) - invt(99, 0.975)*.0501614/(.53*.47)), invlogit(logit(0.53) + invt(99, 0.975)*.0501614/(.53*.47))
        .43059634 .62707928
        
        . di invlogit(logit(0.53) - invt(99, 0.975)*0.04990992/(.53*.47)), invlogit(logit(0.53) + invt(99, 0.975)*0.04990992/(.53*.47))
        .43108755 .62661072
        Thus we can reproduce results that you see from both versions of Stata. The formula you show is the normal (Wald-type) confidence interval which is another method to use.

        My speculation here is that the change in the computation of the standard error is to align it with the calculation from the binomial distribution.

        Comment


        • #5
          Thank you so much Joseph, Joro and Leonardo.

          Leonardo, your explanations are very clear and have helped me understand why the standard errors are different, and how the confidence intervals are calculated in each case (using the logit-transformation). One doubt remains: when I "manually" calculate the normal (Wald-type) CIs with my formula, I get some numbers. When I run -proportion, citype(wald)- in Stata 15.1, I obtain different results for the CIs. Look:
          Code:
          clear 
          input x freq
          0 47
          1 53
          end
          expand freq
          drop freq
          
          // proportion using Wald CIs
          proportion x, citype(wald)
            
          // "manual" calculation of CIs (wald)
          di .53-(1.96*.04990992), .53+(1.96*.04990992)
          Any clue why this is so?

          Best regards, and thanks again.

          Comment


          • #6
            You're calculation assumes a large-sample confidence interval, which is ok, but Stata reports a T-statistic instead, which means the critical threshold must be derived from the T-distribution. Of course, the t-based interval will converge to the normal in expectation with large sample size. This is explained in the PDF documentation (under Methods and Formulas) following the link from -help proportion-.

            Code:
            . di .53-invt(99, 0.975)*0.04990992, .53+invt(99, 0.975)*0.04990992
            .43096789 .62903211

            Comment


            • #7
              Code:
              .  di .53-(invt(99, 0.975)*sqrt(.53*.47/100)), .53+(invt(99, 0.975)*sqrt(.53*.47/100))
              .43096789 .62903211
              See page 2665 of https://www.stata.com/manuals/r.pdf

              Comment


              • #8
                Thank you so much for your answers!
                Problem solved

                Comment


                • #9
                  As an addendum, I flagged a bug in -ci proportion- to Stata Technical Services in regards to some CI types not respecting frequency weights correctly. These do not (yet) appear fixed in Stata 16 but were presumably fixed for Stata 17.

                  Comment

                  Working...
                  X