Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between confidence intervals reported by command "proportion" and "ci"

    Dear Statalists,

    I'm trying to analyse data from a very simple survey, in which I have asked 65 individuals a multiple choice question to test their knowledge about a certain topic. The question had 4 given answers, of which only one was correct. The correct answer was chosen by 10 individuals. Although my sample was a convenience sample rather than a truly representative random sample of the population, I would like to report a 95% confidence interval around the point estimate.

    Having familiarized myself with different possibilities to address this seemingly simple problem in STATA, I realized that STATA offers several possibilities (I work with version 13.1). I think that the svy: commands are not required here because I neither have sampling weights nor clustered samples or anything else that would require to make things more complicated than needed (am I right with this assumption?). Basically, I have a proportion (10/65), and I need to construct a confidence interval around that point estimate.

    STATA offers the commands "proportion" and "ci". I am aware that "proportion" does not require binomial variables, whereas "ci" does. However, when I use a binomial dummy variable that codes whether the answer was correct or not, I should be able to use either command and get a valid result. Remarkably, however, the methods used to calculate the confidence interval vary substantially. ci allows to select one of 5 well known methods of calculating binomial confidence intervals, whereas "proportion" uses a logit transform as default and allows to use bootstrap or jackknife techniques. While I have read literature about the pro's and con's of different binomial approaches, I was wondering why the STATA command "proportion" uses a different approach such that the results from "proportion" are not reproducible with "ci" and vice versa; and whether "proportion" or "ci" is most appropriate for my data.

    Cheers,
    Patrick

  • #2
    As you noticed, there are several ways of calculating confidence intervals of a proportion. Two articles will be helpful for choosing among different methods: Of these, the Wilson interval shows good properties and can be calculated using ci with the option binomial wilson (see example below).

    The default option of proportion is citype(logit) that calculates logit transformed confidence limits that stay within the the interval [0,1] (by the way: The Wilson interval also stays within the parameter space of a proportion). The formula is shown in the manual [R]. Note that this confidence interval is not identical to the interval obtained when using the standard error of the logistic regression equation (see example below). Also note that option citype(normal) of proportion uses (as it should) the t-distribution, not the standard normal distribution (also shown in the example below).

    The following example demonstrates different ways to calculate confidence intervals of a proportion and may be helpful to understand what proportion and ci are doing:

    Code:
    clear
    input x freq
    0 55
    1 10
    end
    expand freq
    drop freq
    
    * Normal approximation (t-distribution):
    proportion x, citype(normal)
    
    reg x  // identical to -proportion x, citype(normal)-
    sca df = e(df_r)
    sca prop = _b[_cons]
    sca se_prop = _se[_cons]
    
    * Normal approximation (standard normal distribution):
    sca cil1 = prop - abs(invnormal((100-c(level))/200))*se_prop
    sca ciu1 = prop + abs(invnormal((100-c(level))/200))*se_prop
    di "prop = " prop ", CI(lower) = " cil1 ", CI(upper) = " ciu1
    
    sca logit = ln(prop/(1-prop))
    sca logit_se = se_prop/(prop*(1-prop))
    sca ORl1 = exp(logit - invttail(df,(100-c(level))/200)*logit_se)
    sca ORu1 = exp(logit + invttail(df,(100-c(level))/200)*logit_se)
    di "prop = " prop ", CI(lower) = " ORl1/(1+ORl1) ", CI(upper) = " ORu1/(1+ORu1)
    
    * Logit transformation:
    proportion x, citype(logit)
    
    qui logistic x
    sca df = e(N)-1
    sca b = _b[_cons]
    sca se_b = _se[_cons]
    
    * Using standard normal distribution:
    sca ORl2 = exp(b - abs(invnormal((100-c(level))/200))*se_b)
    sca ORu2 = exp(b + abs(invnormal((100-c(level))/200))*se_b)
    di "prop = " prop ", CI(lower) = " ORl2/(1+ORl2) ", CI(upper) = " ORu2/(1+ORu2)
    
    * Using t-distribution (not identical to -proportion x, citype(logit)- !)
    sca ORl3 = exp(b - invttail(df,(100-c(level))/200)*se_b)
    sca ORu3 = exp(b + invttail(df,(100-c(level))/200)*se_b)
    di "prop = " prop ", CI(lower) = " ORl3/(1+ORl3) ", CI(upper) = " ORu3/(1+ORu3)
    
    * Some methods of ci:
    ci x  // identical to -proportion x, citype(normal)-
    ci x, binomial wald
    ci x, binomial
    ci x, binomial wilson
    By the way: The correct spelling of Stata is Stata, not STATA, please read the FAQ: http://www.statalist.org/forums/help#spelling .
    Last edited by Dirk Enzmann; 03 Mar 2016, 10:32.

    Comment


    • #3
      I just recognized that I created an example where not using version control bites!

      The ci command of my example does not work when using version 14.1. To fix it, start the commands shown in the example in #2 with
      Code:
      version 13
      Compare
      Code:
      version 13
      clear
      input x freq
      0 55
      1 10
      end
      expand freq
      
      * Some methods of ci:
      ci x // identical to -proportion x, citype(normal)-
      ci x, binomial wald
      ci x, binomial
      ci x, binomial wilson
      to
      Code:
      version 14.1
      clear
      input x freq
      0 55
      1 10
      end
      expand freq
      
      * Some methods of ci:
      ci mean x // identical to -proportion x, citype(normal)-
      ci prop x, wald
      ci prop x
      ci prop x, wilson
      All examples assume that you run them as .do files - copying them into the command window will fail due to the use of the // comment indicator.

      Comment


      • #4
        Thank you very much, Dirk, for your detailed response and for providing the code!

        Comment


        • #5
          Thank you very much for the information. I have an additional question.
          If I have 4 age groups, and I want to calculate 95%CIs for each proportion of them. Can we calculate wilson CIs? or just use "proportion" comment?
          Thanks.

          Comment

          Working...
          X