Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample size for one proportion (prevalence in cross-sectional design)

    Hello everybody, and thank you in advance.

    I'm trying to calculate a sample size for a cross-sectional;
    I want to estimate a prevalence (expected: 5%) with a maximum width 95%CI of 2%.

    I see that "power oneproportion" doesn't let you to set the precision of your estimate.
    Does anybody know how to fix it?

    Thanks again.
    Gianfranco

  • #2
    see
    Code:
    help ciwidth

    Comment


    • #3
      Thank you Rich Goldstein , but ciwidth looks appropriate for normal approximation, not for proportion. Doesn't it?

      Comment


      • #4
        there are a number of things that could be discussed here (including the necessity of guesses and approximations when doing sample size estimation) but I limit myself to a little about the robustness of t-tests; there is a large literature on this; I thought that there was an article definitely about use of t-tests for proportions but I can't offhand find it; however, here are two other cites that may be of interest along with their abstracts, though neither deals directly with binary data:

        J Dent Res

        1992 Dec;71(12):1938-43.
        doi: 10.1177/00220345920710121601.
        Robustness of the t test applied to data distorted from normality by floor effects
        L M Sullivan, R B D'Agostino

        PMID: 1452898 DOI: 10.1177/00220345920710121601

        Abstract

        In calculus, plaque, and gingivitis trials, measures are taken on subjects both
        prior to the use of an active treatment and after its use. When the trial is
        short-term, or when a cleaning of the mouth takes place after the baseline
        measurement, distributions of such measures (e.g., the Volpe-Manhold score or
        the Löe and Silness scale) are approximately normally distributed above zero
        but also can have a proportion of subjects who attain scores of zero. When the
        effects of an active treatment are compared with those of a control, the
        two-independent-sample t test can be applied to outcome scores or to
        differences between the baseline and outcome scores. Robustness of these t
        tests, in the presence of distributions "distorted" from normality as
        described, was investigated by computer simulation. In general, both t tests
        produced actual significance levels which were close to nominal significance
        levels, even in the presence of small samples and distributions in which as
        many as 50% of the subjects attained scores of zero.

        STATISTICS IN MEDICINE, VOL. 6, 79-90 (1987)
        ROBUSTNESS OF THE TWO INDEPENDENT SAMPLES t-TEST WHEN APPLIED TO ORDINAL SCALED
        DATA
        TIMOTHY HEEREN
        Boston University School of Public Health, 80 East Concord Street, Boston,
        Massuchusetts 02118. U.S.A. RALPH D’AGOSTINO
        Boston University Depnrtment of Mathematics, I I 1 Cummington Street, Boston.
        Massachusetts 02215, U S .A .

        SUMMARY

        One may encounter the application of the two independent samples t-test to
        ordinal scaled data (for example, data that assume only the values 0, I, 2, 3)
        from small samples. This situation clearly violates the underlying normality
        assumption for the [-test and one cannot appeal to large sample theory for
        vaIidity. In this paper we report the results of an investigation of the
        f-test’s robustness when applied to data of this form for samples of
        sizes 5 to 20. Our approach consists of complete enumeration
        of the sampling distributions and comparison of actual levels of significance
        with the significance level expected if the data followed a normal
        distribution. We demonstrate under general conditions the robustness of the
        t-test in that the maximum actual level of significance is close to the
        declared level.

        Comment


        • #5
          Originally posted by Gianfranco Di Gennaro View Post
          I'm trying to calculate a sample size for a cross-sectional;
          I want to estimate a prevalence (expected: 5%) with a maximum width 95%CI of 2%.
          You can use simulation; maybe something like the following.

          .ÿ
          .ÿversionÿ16.1

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿsetÿseedÿ`=strreverse("1591764")'

          .ÿ
          .ÿprogramÿdefineÿsimem,ÿrclass
          ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ16.1
          ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿn(integer)ÿ[Pi(realÿ0.05)ÿWidth(realÿ0.02)]
          ÿÿ3.ÿ
          .ÿÿÿÿÿÿÿÿÿlocalÿsuccessÿ=ÿrbinomial(`n',ÿ`pi')
          ÿÿ4.ÿÿÿÿÿÿÿÿÿciiÿproportionsÿ`n'ÿ`success',ÿwilson
          ÿÿ5.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿposÿ=ÿabs(r(ub)ÿ-ÿr(lb))ÿ<=ÿ`width'
          ÿÿ6.ÿend

          .ÿ
          .ÿforvaluesÿnÿ=ÿ1900(50)2100ÿ{
          ÿÿ2.ÿÿÿÿÿÿÿÿÿquietlyÿsimulateÿposÿ=ÿr(pos),ÿreps(3000)ÿnodots:ÿsimemÿ,ÿn(`n')
          ÿÿ3.ÿÿÿÿÿÿÿÿÿsummarizeÿpos,ÿmeanonly
          ÿÿ4.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Nÿ=ÿ`n'ÿPowerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
          ÿÿ5.ÿ}
          Nÿ=ÿ1900ÿPowerÿ=ÿ0.65
          Nÿ=ÿ1950ÿPowerÿ=ÿ0.74
          Nÿ=ÿ2000ÿPowerÿ=ÿ0.83
          Nÿ=ÿ2050ÿPowerÿ=ÿ0.90
          Nÿ=ÿ2100ÿPowerÿ=ÿ0.95

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .



          Originally posted by Rich Goldstein View Post
          I thought that there was an article definitely about use of t-tests for proportions but I can't offhand find it
          Yeah, I remember running across that, too, but I recall that it was some kind of obiter dictum in the introdution or discussion section of some paper, maybe, Agresti and Coull or Agresti and Caffo? I don't have either handy at the moment and so cannot check to be sure.

          A. Agresti & B. A. Coull, Approximate is better than 'exact' for interval estimation of binomial proportions. _The American Statistician_ 52:119–26, 1998.

          A. Agresti & B. Caffo, Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. _The American Statistician_ 54:280-88, 2000.

          Comment


          • #6
            Joseph Coveney Hi and thank you for the citations;

            based on a very quick look, I think that the earlier one is relevant but the actual claim is that the normal-theory Wald approximation gives CI's that are too narrow (while the one based on the "exact" binomial is too wide) and they provide a different approximation that they like

            Best,
            Rich

            Comment


            • #7
              I thought this was interesting and I found that the Stata manual leaves this as an exercise for the reader to try to implement their own -ciwidth- methods. You may find it through the PDF document for -help ciwidth usermethod- and going to "More examples: Compute probability of CI width for a oneproportion
              CI".

              Taking the code presented in the manual and the example given by Joseph, you obtain the same results but can conveniently use the -ciwidth- facility to say get a sample size graph at the end.

              Code:
              clear *
              cls
              
              version 16.1
              set seed `=strreverse("1591764")'
              
              program myonepropsim, rclass
                version 16.1
                args n p level wilson
                clear
                set obs `n'
                generate byte y = rbinomial(1, `p')
                ci proportions y, level(`level') wilson
                return scalar w = r(ub)-r(lb)
              end
              
              program ciwidth_cmd_myonepropsim_init, sclass
                version 16.1
                sreturn clear
                sreturn local prss_argnames = "p"
                sreturn local prss_colnames = "p"
                sreturn local prss_subtitle = "Two-sided Wilson CI"
              end
              
              program ciwidth_cmd_myonepropsim, rclass
                version 16.1
                /* parse command arguments and options */
                syntax  anything(name=p), /// proportion estimate
                        n(integer) /// sample size
                        Width(real) /// target CI width
                        [ Level(cilevel) /// confidence level
                        reps(integer 100) wilson qui ]
                /* compute probability of CI width using simulation */
                display as txt _n "Computing Pr(width) for n=`n' and width=`w' ..."
                `qui' simulate w=r(w), reps(`reps'): myonepropsim `n' `p' `level'
                qui count if w <= `width'
                /* store results */
                return scalar Pr_width = r(N)/`reps'
                return scalar level = `level'
                return scalar N = `n'
                return scalar width = `width'
                return scalar p = `p'
              end
              
              ciwidth myonepropsim 0.05, n(1900(50)2100) reps(3000) width(0.02) qui table graph
              And results in:

              Code:
                +------------------------------------------+
                |   level       N Pr_width   width       p |
                |------------------------------------------|
                |      95   1,900    .6473     .02     .05 |
                |      95   1,950     .735     .02     .05 |
                |      95   2,000    .8383     .02     .05 |
                |      95   2,050    .9057     .02     .05 |
                |      95   2,100    .9577     .02     .05 |
                +------------------------------------------+
              Click image for larger version

Name:	sample_size_graph.jpg
Views:	1
Size:	26.6 KB
ID:	1591991

              Comment


              • #8
                How about
                Code:
                local sd = sqrt(0.05*0.95)
                ciwidth onemean, width(0.02) sd(`sd') knownsd
                The commonly used formula for sample size calculation for estimating a prevalence is
                p=z2*p(1-p)/d2
                The
                Code:
                ciwidth onemean
                command works the same way when we use the knownsd option and calculate sd as
                sqrt(p(1-p))

                Comment

                Working...
                X