Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determining 95% CI precision with ciwidth - am I doing this right?

    We are drafting a proposal to study latent tuberculosis (TB) infection, and can affort to enroll 200 participants at risk for TB. We are estimating our latent TB (i.e., not clinically evident) prevalence may range from 30-50%. I would like to graph how the width of our 95% CI changes as we go from a prevalence of 50%, down to 30%.

    I started with

    ciwidth onemean, probwidth(0.5) n(200)

    But I'm not sure I'm coding this correctly, and don't fully understand the output. I was assuming probwidth(0.5) would show me estimates for a prevalence of 50%, but I'm fairly sure I'm wrong here.

    I used the pull down menus to get to CI Interval for One Mean, but I don't think it's written for biostatisicians and epidemiologists; I didn't understand what the fields were that I needed to complete.

    If I know the overall sample size (this is fixed at 200), have several estimates of disease prevalence we would like to visualize (e.g., 30%, 35%, 40%, 45%, 50%), and want the output to be the width of the 95% CI around the prevalence estimates... What do I do to get this information?

    I can enter this data into Excel or similar to graph this, but can stata take these results and graph them for me?

    Thank you in advance!
    Last edited by Matt Price; 24 Jun 2024, 17:38.

  • #2
    Originally posted by Matt Price View Post
    I would like to graph how the width of our 95% CI changes as we go from a prevalence of 50%, down to 30%.
    You're not looking for something like this, are you?
    Code:
    version 18.0
    
    clear *
    
    tempname ci_width Results
    
    forvalues prevalence = 30(5)50 {
        quietly cii proportion 200 `=floor(200 * `prevalence' / 100)', wilson
        scalar define `ci_width' = r(ub) - r(lb)
        display in smcl as text "Prevalence (%): `prevalence', CI Width " ///
            as result %06.4f `ci_width'
        matrix define `Results' = (nullmat(`Results') \ (`prevalence', `ci_width'))
    }
    
    quietly svmat double `Results', names(var)
    graph twoway ///
        connected var2 var1, lcolor(black) msymbol(O) mcolor(black) ///
            mfcolor(white) ///
        scheme(s2color) ///
        ytitle(95% CI Width) ylabel( , format(%05.3f) angle(horizontal) nogrid) ///
        xtitle(Prevalence (%))
    
    exit
    If so, then that's not what I think that the purpose of a command like ciwidth is all about. I believe that it treats the confidence interval as a random variate for realized datasets of your given mean, N and standard deviation (one is the default for the latter).

    If I know the overall sample size (this is fixed at 200), have several estimates of disease prevalence we would like to visualize (e.g., 30%, 35%, 40%, 45%, 50%), and want the output to be the width of the 95% CI around the prevalence estimates... What do I do to get this information?
    If I'm not mistaken (a big if, yes), then the analogous approach for binomial data to what ciwidth does for normally distributed data would be something like the following. There might be an analytical solution, but I instead use simulation with three thousand replicates (realized 95% confidence intervals) given your sample size and range of population prevalence.
    Code:
    version 18.0
    
    clear *
    
    // seedem
    set seed 480419503
    
    program define binciwidth
        version 18.0
        syntax , [Prevalence(integer 50) n(integer 200) reps(integer 3000)]
    
        drop _all
        quietly set obs `reps'
    
        tempname pvl count lb ub ci_width
    
        // Realized latent TB cases given prevalence (%) and sample size
        generate long `pvl' = rbinomial(`n', `prevalence' / 100)
        contract `pvl', freq(`count')
    
        // Realized 95% CIs
        quietly generate double `lb' = .
        quietly generate double `ub' = .
    
        forvalues row = 1/`=_N' {
            local realized_cases = `pvl'[`row']
            quietly cii proportions `n' `realized_cases', wilson
            quietly replace `lb' = r(lb) in `row'
            quietly replace `ub' = r(ub) in `row'
        }
        generate double `ci_width' = `ub' - `lb'
    
        // Distribution of CI widths
        quietly summarize `ci_width' [fweight=`count'], detail
    end
    
    frame create ForGraph int prevalence ///
        double(mean min max median centile5 centile95)
    
    forvalues prevalence = 30(5)50 {
        binciwidth , p(`prevalence')
        frame post ForGraph (`prevalence') (r(mean)) (r(min)) (r(max)) ///
            (r(p50)) (r(p5)) (r(p95))
    }
    
    cwf ForGraph
    
    set more on
    graph twoway ///
        rspike centile95 centile5 prevalence, lcolor(black) || ///
        scatter median prevalence, msymbol(O) mcolor(black) mfcolor(white) ///
            msize(medium) ///
        scheme(s2color) title("Median, 5th, 95th percentiles", ring(0) ///
            position(10) size(small)) ///
        ytitle(95% CI Width) ylabel( , format(%05.3f) angle(horizontal) nogrid) ///
        xtitle(Prevalence (%)) ///
        legend(off)
    quietly graph export `c(pwd)'`c(dirsep)'CICentiles.png
    more
    
    graph twoway ///
        rspike max min prevalence, lcolor(black) || ///
        scatter mean prevalence, msymbol(O) mcolor(black) ///
            mfcolor(white) msize(medium) ///
        scheme(s2color) title("Mean, range", ring(0) ///
            position(10) size(small)) ///
        ytitle(95% CI Width) ylabel( , format(%05.3f) angle(horizontal) nogrid) ///
        xtitle(Prevalence (%)) ///
        legend(off)
    quietly graph export `c(pwd)'`c(dirsep)'CIMeanRange.png
    
    exit
    If you're wondering why the graphs (attached below) look the way they do, you can run the following to visualize how the shape of the 95% CI widths' distribution changes with increasing population prevalence.
    Code:
    version 18.0
    
    clear *
    
    // seedem
    set seed 401603685
    
    forvalues prevalence = 25(5)50 {
        drop _all
        quietly set obs 3000
    
        generate int cases = rbinomial(200, `prevalence' / 100)
        contract cases, freq(count)
    
        quietly generate double lb = .
        quietly generate double ub = .
        forvalues row = 1/`=_N' {
            quietly cii proportions 200 `=cases[`row']', wilson
            quietly replace lb = r(lb) in `row'
            quietly replace ub = r(ub) in `row'
        }
        generate double ci_width = ub - lb
    
        quietly histogram ci_width [fweight=count], scheme(s2color) ///
            title(Prevalence `prevalence'%, ring(0) position(10) size(medlarge)) ///
            ylabel( , angle(horizontal) nogrid) xtitle(95% CI Width) ///
            name(width`prevalence')
        local names `names' width`prevalence'
    }
    
    graph combine `names'
    
    exit
    Click image for larger version

Name:	CICentiles.png
Views:	1
Size:	22.5 KB
ID:	1757121


    Click image for larger version

Name:	CIMeanRange.png
Views:	1
Size:	21.2 KB
ID:	1757122

    Comment

    Working...
    X