Standard errors and 95% Confidence Intervals for Proportions - Differences between different Stata versions

Paolo Moncagatta

Join Date: May 2014

Posts: 9
#1

Standard errors and 95% Confidence Intervals for Proportions - Differences between different Stata versions

31 Mar 2021, 23:01

Dear community:

I am having doubts about how Stata calculates standard errors and 95% confidence intervals for proportions, especially because I get different results from different versions of Stata. Here is my code example:

Code:

clear input x freq 0 47 1 53 end expand freq drop freq proportion x

When I run the code above in Stata 13.1, I get a Std. Err. of .0501614 and a 95% Conf. Interval that goes from .4305964 to .6270792

When I run the same code in Stata 15.1, I get a Std. Err. of .0499099 and a 95% Conf. Interval that goes from .4310876 to .6266107

I have two questions:

1) Why are the standard errors different?

2) I have tried to "manually" calculate the 95% confidence intervals with the formula: CI(lower) = prop-(1.96*std. err.) ; CI(upper) = prop+(1.96*std. err.), but in both cases I don´t get the same results as those provided by Stata.

Am I doing anything wrong? Any advice and/or clarification is greatly appreciated.

Best regards,

Paolo Moncagatta
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4356
#2

31 Mar 2021, 23:37

What's the method in each case? It might have changed between versions, for example, from Wald (normal approximation) to exact or something. Currently, it's from the logit transformation. You might try using version control from the Release 15.1 installation

Code:

version 13.1: proportion x

and see whether that fixes things up.

If not, then try something along the lines of the following to investigate further.

Code:

version 15.1 clear * input byte(x freq) 0 47 1 53 end /* expand freq drop freq */ proportion x [fweight=freq] proportion x [fweight=freq], citype(wald) exit
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#3

01 Apr 2021, 03:20

There are two courses of action in a situation like this:

1. We either do not worry about small difference and trust that Stata Corp did the right thing. These are probably two asymptotically equivalent version of the procedure, note that the word Logit appears in Version 15, and the word Logit does not appear in Version 11 (I do not have Version 13 installed, but the rest of the numbers you mention for Version 13 check in Version 11).

2. Or if we do worry, we need to read the manual Methods and Formulas, and see for ourselves where the difference comes from. I cannot do this for you because I have the manual of Stata 15 only.

My guess is that as Joseph said, some asymptotic procedure in Version 13 changed to another Logit based asymptotic procedure in Stata 15.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#4

01 Apr 2021, 12:18

In version 13, it seems that Stata took the approach (manual is available online) that a proportion is just a mean of a 0/1 indicator. This meant using -mean- to compute the SEM, which uses an n-1 df adjustment, instead of n, with the usual computation of variance for a proportion. This appears to have changed in version 14 (see -help whatsnew14- update 29oct2015).

Code:

. di sqrt(0.53*(1-0.53)/(100-1)) .05016136 . di sqrt(0.53*(1-0.53)/(100)) .04990992

Then, the CI are computed using the logit-transformation using the plugin estimate for variance as above.

Code:

. di invlogit(logit(0.53) - invt(99, 0.975)*.0501614/(.53*.47)), invlogit(logit(0.53) + invt(99, 0.975)*.0501614/(.53*.47)) .43059634 .62707928 . di invlogit(logit(0.53) - invt(99, 0.975)*0.04990992/(.53*.47)), invlogit(logit(0.53) + invt(99, 0.975)*0.04990992/(.53*.47)) .43108755 .62661072

Thus we can reproduce results that you see from both versions of Stata. The formula you show is the normal (Wald-type) confidence interval which is another method to use.

My speculation here is that the change in the computation of the standard error is to align it with the calculation from the binomial distribution.
Comment
Paolo Moncagatta

Join Date: May 2014

Posts: 9
#5

01 Apr 2021, 14:07

Thank you so much Joseph, Joro and Leonardo.

Leonardo, your explanations are very clear and have helped me understand why the standard errors are different, and how the confidence intervals are calculated in each case (using the logit-transformation). One doubt remains: when I "manually" calculate the normal (Wald-type) CIs with my formula, I get some numbers. When I run -proportion, citype(wald)- in Stata 15.1, I obtain different results for the CIs. Look:

Code:

clear input x freq 0 47 1 53 end expand freq drop freq // proportion using Wald CIs proportion x, citype(wald) // "manual" calculation of CIs (wald) di .53-(1.96*.04990992), .53+(1.96*.04990992)

Any clue why this is so?

Best regards, and thanks again.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#6

01 Apr 2021, 14:26

You're calculation assumes a large-sample confidence interval, which is ok, but Stata reports a T-statistic instead, which means the critical threshold must be derived from the T-distribution. Of course, the t-based interval will converge to the normal in expectation with large sample size. This is explained in the PDF documentation (under Methods and Formulas) following the link from -help proportion-.

Code:

. di .53-invt(99, 0.975)*0.04990992, .53+invt(99, 0.975)*0.04990992 .43096789 .62903211
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#7

01 Apr 2021, 14:32

Code:

. di .53-(invt(99, 0.975)*sqrt(.53*.47/100)), .53+(invt(99, 0.975)*sqrt(.53*.47/100)) .43096789 .62903211

See page 2665 of https://www.stata.com/manuals/r.pdf
Comment
Paolo Moncagatta

Join Date: May 2014

Posts: 9
#8

01 Apr 2021, 15:53

Thank you so much for your answers!
Problem solved
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#9

22 Apr 2021, 14:23

As an addendum, I flagged a bug in -ci proportion- to Stata Technical Services in regards to some CI types not respecting frequency weights correctly. These do not (yet) appear fixed in Stata 16 but were presumably fixed for Stata 17.
Comment

Announcement

Standard errors and 95% Confidence Intervals for Proportions - Differences between different Stata versions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment