Difference between confidence intervals reported by command "proportion" and "ci"

Patrick Schober

Join Date: Mar 2016

Posts: 17
#1

Difference between confidence intervals reported by command "proportion" and "ci"

02 Mar 2016, 06:27

Dear Statalists,

I'm trying to analyse data from a very simple survey, in which I have asked 65 individuals a multiple choice question to test their knowledge about a certain topic. The question had 4 given answers, of which only one was correct. The correct answer was chosen by 10 individuals. Although my sample was a convenience sample rather than a truly representative random sample of the population, I would like to report a 95% confidence interval around the point estimate.

Having familiarized myself with different possibilities to address this seemingly simple problem in STATA, I realized that STATA offers several possibilities (I work with version 13.1). I think that the svy: commands are not required here because I neither have sampling weights nor clustered samples or anything else that would require to make things more complicated than needed (am I right with this assumption?). Basically, I have a proportion (10/65), and I need to construct a confidence interval around that point estimate.

STATA offers the commands "proportion" and "ci". I am aware that "proportion" does not require binomial variables, whereas "ci" does. However, when I use a binomial dummy variable that codes whether the answer was correct or not, I should be able to use either command and get a valid result. Remarkably, however, the methods used to calculate the confidence interval vary substantially. ci allows to select one of 5 well known methods of calculating binomial confidence intervals, whereas "proportion" uses a logit transform as default and allows to use bootstrap or jackknife techniques. While I have read literature about the pro's and con's of different binomial approaches, I was wondering why the STATA command "proportion" uses a different approach such that the results from "proportion" are not reproducible with "ci" and vice versa; and whether "proportion" or "ci" is most appropriate for my data.

Cheers,
Patrick
Tags: None
Dirk Enzmann

Join Date: Apr 2014

Posts: 523
#2

03 Mar 2016, 10:29

As you noticed, there are several ways of calculating confidence intervals of a proportion. Two articles will be helpful for choosing among different methods:
Brown, L.D., Cai, T.T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16(2), 101-133.

Newcombe, R.G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17, 857-872.

Of these, the Wilson interval shows good properties and can be calculated using ci with the option binomial wilson (see example below).

The default option of proportion is citype(logit) that calculates logit transformed confidence limits that stay within the the interval [0,1] (by the way: The Wilson interval also stays within the parameter space of a proportion). The formula is shown in the manual [R]. Note that this confidence interval is not identical to the interval obtained when using the standard error of the logistic regression equation (see example below). Also note that option citype(normal) of proportion uses (as it should) the t-distribution, not the standard normal distribution (also shown in the example below).

The following example demonstrates different ways to calculate confidence intervals of a proportion and may be helpful to understand what proportion and ci are doing:

Code:

clear input x freq 0 55 1 10 end expand freq drop freq * Normal approximation (t-distribution): proportion x, citype(normal) reg x // identical to -proportion x, citype(normal)- sca df = e(df_r) sca prop = _b[_cons] sca se_prop = _se[_cons] * Normal approximation (standard normal distribution): sca cil1 = prop - abs(invnormal((100-c(level))/200))*se_prop sca ciu1 = prop + abs(invnormal((100-c(level))/200))*se_prop di "prop = " prop ", CI(lower) = " cil1 ", CI(upper) = " ciu1 sca logit = ln(prop/(1-prop)) sca logit_se = se_prop/(prop*(1-prop)) sca ORl1 = exp(logit - invttail(df,(100-c(level))/200)*logit_se) sca ORu1 = exp(logit + invttail(df,(100-c(level))/200)*logit_se) di "prop = " prop ", CI(lower) = " ORl1/(1+ORl1) ", CI(upper) = " ORu1/(1+ORu1) * Logit transformation: proportion x, citype(logit) qui logistic x sca df = e(N)-1 sca b = _b[_cons] sca se_b = _se[_cons] * Using standard normal distribution: sca ORl2 = exp(b - abs(invnormal((100-c(level))/200))*se_b) sca ORu2 = exp(b + abs(invnormal((100-c(level))/200))*se_b) di "prop = " prop ", CI(lower) = " ORl2/(1+ORl2) ", CI(upper) = " ORu2/(1+ORu2) * Using t-distribution (not identical to -proportion x, citype(logit)- !) sca ORl3 = exp(b - invttail(df,(100-c(level))/200)*se_b) sca ORu3 = exp(b + invttail(df,(100-c(level))/200)*se_b) di "prop = " prop ", CI(lower) = " ORl3/(1+ORl3) ", CI(upper) = " ORu3/(1+ORu3) * Some methods of ci: ci x // identical to -proportion x, citype(normal)- ci x, binomial wald ci x, binomial ci x, binomial wilson

By the way: The correct spelling of Stata is Stata, not STATA, please read the FAQ: http://www.statalist.org/forums/help#spelling .

Last edited by Dirk Enzmann; 03 Mar 2016, 10:32.
2 likes
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 523
#3

03 Mar 2016, 21:04

I just recognized that I created an example where not using version control bites!

The ci command of my example does not work when using version 14.1. To fix it, start the commands shown in the example in #2 with

Code:

version 13

Compare

Code:

version 13 clear input x freq 0 55 1 10 end expand freq * Some methods of ci: ci x // identical to -proportion x, citype(normal)- ci x, binomial wald ci x, binomial ci x, binomial wilson

to

Code:

version 14.1 clear input x freq 0 55 1 10 end expand freq * Some methods of ci: ci mean x // identical to -proportion x, citype(normal)- ci prop x, wald ci prop x ci prop x, wilson

All examples assume that you run them as .do files - copying them into the command window will fail due to the use of the // comment indicator.
1 like
Comment
Patrick Schober

Join Date: Mar 2016

Posts: 17
#4

04 Mar 2016, 06:47

Thank you very much, Dirk, for your detailed response and for providing the code!
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#5

18 Jun 2017, 16:40

Thank you very much for the information. I have an additional question.
If I have 4 age groups, and I want to calculate 95%CIs for each proportion of them. Can we calculate wilson CIs? or just use "proportion" comment?
Thanks.
Comment

Announcement

Difference between confidence intervals reported by command "proportion" and "ci"

Comment

Comment

Comment

Comment