Determining 95% CI precision with ciwidth - am I doing this right?

Matt Price

Join Date: Aug 2023

Posts: 43
#1

Determining 95% CI precision with ciwidth - am I doing this right?

24 Jun 2024, 17:22

We are drafting a proposal to study latent tuberculosis (TB) infection, and can affort to enroll 200 participants at risk for TB. We are estimating our latent TB (i.e., not clinically evident) prevalence may range from 30-50%. I would like to graph how the width of our 95% CI changes as we go from a prevalence of 50%, down to 30%.

I started with

ciwidth onemean, probwidth(0.5) n(200)

But I'm not sure I'm coding this correctly, and don't fully understand the output. I was assuming probwidth(0.5) would show me estimates for a prevalence of 50%, but I'm fairly sure I'm wrong here.

I used the pull down menus to get to CI Interval for One Mean, but I don't think it's written for biostatisicians and epidemiologists; I didn't understand what the fields were that I needed to complete.

If I know the overall sample size (this is fixed at 200), have several estimates of disease prevalence we would like to visualize (e.g., 30%, 35%, 40%, 45%, 50%), and want the output to be the width of the 95% CI around the prevalence estimates... What do I do to get this information?

I can enter this data into Excel or similar to graph this, but can stata take these results and graph them for me?

Thank you in advance!

Last edited by Matt Price; 24 Jun 2024, 17:38.
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4352

25 Jun 2024, 00:13

Originally posted by Matt Price View Post

I would like to graph how the width of our 95% CI changes as we go from a prevalence of 50%, down to 30%.

You're not looking for something like this, are you?

Code:

version 18.0

clear *

tempname ci_width Results

forvalues prevalence = 30(5)50 {
    quietly cii proportion 200 `=floor(200 * `prevalence' / 100)', wilson
    scalar define `ci_width' = r(ub) - r(lb)
    display in smcl as text "Prevalence (%): `prevalence', CI Width " ///
        as result %06.4f `ci_width'
    matrix define `Results' = (nullmat(`Results') \ (`prevalence', `ci_width'))
}

quietly svmat double `Results', names(var)
graph twoway ///
    connected var2 var1, lcolor(black) msymbol(O) mcolor(black) ///
        mfcolor(white) ///
    scheme(s2color) ///
    ytitle(95% CI Width) ylabel( , format(%05.3f) angle(horizontal) nogrid) ///
    xtitle(Prevalence (%))

exit

If so, then that's not what I think that the purpose of a command like ciwidth is all about. I believe that it treats the confidence interval as a random variate for realized datasets of your given mean, N and standard deviation (one is the default for the latter).

If I know the overall sample size (this is fixed at 200), have several estimates of disease prevalence we would like to visualize (e.g., 30%, 35%, 40%, 45%, 50%), and want the output to be the width of the 95% CI around the prevalence estimates... What do I do to get this information?

If I'm not mistaken (a big if, yes), then the analogous approach for binomial data to what ciwidth does for normally distributed data would be something like the following. There might be an analytical solution, but I instead use simulation with three thousand replicates (realized 95% confidence intervals) given your sample size and range of population prevalence.

Code:

version 18.0

clear *

// seedem
set seed 480419503

program define binciwidth
    version 18.0
    syntax , [Prevalence(integer 50) n(integer 200) reps(integer 3000)]

    drop _all
    quietly set obs `reps'

    tempname pvl count lb ub ci_width

    // Realized latent TB cases given prevalence (%) and sample size
    generate long `pvl' = rbinomial(`n', `prevalence' / 100)
    contract `pvl', freq(`count')

    // Realized 95% CIs
    quietly generate double `lb' = .
    quietly generate double `ub' = .

    forvalues row = 1/`=_N' {
        local realized_cases = `pvl'[`row']
        quietly cii proportions `n' `realized_cases', wilson
        quietly replace `lb' = r(lb) in `row'
        quietly replace `ub' = r(ub) in `row'
    }
    generate double `ci_width' = `ub' - `lb'

    // Distribution of CI widths
    quietly summarize `ci_width' [fweight=`count'], detail
end

frame create ForGraph int prevalence ///
    double(mean min max median centile5 centile95)

forvalues prevalence = 30(5)50 {
    binciwidth , p(`prevalence')
    frame post ForGraph (`prevalence') (r(mean)) (r(min)) (r(max)) ///
        (r(p50)) (r(p5)) (r(p95))
}

cwf ForGraph

set more on
graph twoway ///
    rspike centile95 centile5 prevalence, lcolor(black) || ///
    scatter median prevalence, msymbol(O) mcolor(black) mfcolor(white) ///
        msize(medium) ///
    scheme(s2color) title("Median, 5th, 95th percentiles", ring(0) ///
        position(10) size(small)) ///
    ytitle(95% CI Width) ylabel( , format(%05.3f) angle(horizontal) nogrid) ///
    xtitle(Prevalence (%)) ///
    legend(off)
quietly graph export `c(pwd)'`c(dirsep)'CICentiles.png
more

graph twoway ///
    rspike max min prevalence, lcolor(black) || ///
    scatter mean prevalence, msymbol(O) mcolor(black) ///
        mfcolor(white) msize(medium) ///
    scheme(s2color) title("Mean, range", ring(0) ///
        position(10) size(small)) ///
    ytitle(95% CI Width) ylabel( , format(%05.3f) angle(horizontal) nogrid) ///
    xtitle(Prevalence (%)) ///
    legend(off)
quietly graph export `c(pwd)'`c(dirsep)'CIMeanRange.png

exit

If you're wondering why the graphs (attached below) look the way they do, you can run the following to visualize how the shape of the 95% CI widths' distribution changes with increasing population prevalence.

Code:

version 18.0

clear *

// seedem
set seed 401603685

forvalues prevalence = 25(5)50 {
    drop _all
    quietly set obs 3000

    generate int cases = rbinomial(200, `prevalence' / 100)
    contract cases, freq(count)

    quietly generate double lb = .
    quietly generate double ub = .
    forvalues row = 1/`=_N' {
        quietly cii proportions 200 `=cases[`row']', wilson
        quietly replace lb = r(lb) in `row'
        quietly replace ub = r(ub) in `row'
    }
    generate double ci_width = ub - lb

    quietly histogram ci_width [fweight=count], scheme(s2color) ///
        title(Prevalence `prevalence'%, ring(0) position(10) size(medlarge)) ///
        ylabel( , angle(horizontal) nogrid) xtitle(95% CI Width) ///
        name(width`prevalence')
    local names `names' width`prevalence'
}

graph combine `names'

exit

Announcement

Determining 95% CI precision with ciwidth - am I doing this right?

Comment