Sample size for one proportion (prevalence in cross-sectional design)

Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#1

Sample size for one proportion (prevalence in cross-sectional design)

28 Jan 2021, 06:19

Hello everybody, and thank you in advance.

I'm trying to calculate a sample size for a cross-sectional;
I want to estimate a prevalence (expected: 5%) with a maximum width 95%CI of 2%.

I see that "power oneproportion" doesn't let you to set the precision of your estimate.
Does anybody know how to fix it?

Thanks again.
Gianfranco
Tags: cross-sectional, oneproportion, Power, precision, samplesize
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#2

28 Jan 2021, 07:06

see

Code:

help ciwidth
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#3

28 Jan 2021, 07:08

Thank you Rich Goldstein , but ciwidth looks appropriate for normal approximation, not for proportion. Doesn't it?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#4

28 Jan 2021, 09:22

there are a number of things that could be discussed here (including the necessity of guesses and approximations when doing sample size estimation) but I limit myself to a little about the robustness of t-tests; there is a large literature on this; I thought that there was an article definitely about use of t-tests for proportions but I can't offhand find it; however, here are two other cites that may be of interest along with their abstracts, though neither deals directly with binary data:

J Dent Res

1992 Dec;71(12):1938-43.
doi: 10.1177/00220345920710121601.
Robustness of the t test applied to data distorted from normality by floor effects
L M Sullivan, R B D'Agostino

PMID: 1452898 DOI: 10.1177/00220345920710121601

Abstract

In calculus, plaque, and gingivitis trials, measures are taken on subjects both
prior to the use of an active treatment and after its use. When the trial is
short-term, or when a cleaning of the mouth takes place after the baseline
measurement, distributions of such measures (e.g., the Volpe-Manhold score or
the Löe and Silness scale) are approximately normally distributed above zero
but also can have a proportion of subjects who attain scores of zero. When the
effects of an active treatment are compared with those of a control, the
two-independent-sample t test can be applied to outcome scores or to
differences between the baseline and outcome scores. Robustness of these t
tests, in the presence of distributions "distorted" from normality as
described, was investigated by computer simulation. In general, both t tests
produced actual significance levels which were close to nominal significance
levels, even in the presence of small samples and distributions in which as
many as 50% of the subjects attained scores of zero.

STATISTICS IN MEDICINE, VOL. 6, 79-90 (1987)
ROBUSTNESS OF THE TWO INDEPENDENT SAMPLES t-TEST WHEN APPLIED TO ORDINAL SCALED
DATA
TIMOTHY HEEREN
Boston University School of Public Health, 80 East Concord Street, Boston,
Massuchusetts 02118. U.S.A. RALPH D’AGOSTINO
Boston University Depnrtment of Mathematics, I I 1 Cummington Street, Boston.
Massachusetts 02215, U S .A .

SUMMARY

One may encounter the application of the two independent samples t-test to
ordinal scaled data (for example, data that assume only the values 0, I, 2, 3)
from small samples. This situation clearly violates the underlying normality
assumption for the [-test and one cannot appeal to large sample theory for
vaIidity. In this paper we report the results of an investigation of the
f-test’s robustness when applied to data of this form for samples of
sizes 5 to 20. Our approach consists of complete enumeration
of the sampling distributions and comparison of actual levels of significance
with the significance level expected if the data followed a normal
distribution. We demonstrate under general conditions the robustness of the
t-test in that the maximum actual level of significance is close to the
declared level.
2 likes
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#5

28 Jan 2021, 20:45

Originally posted by Gianfranco Di Gennaro View Post

I'm trying to calculate a sample size for a cross-sectional;
I want to estimate a prevalence (expected: 5%) with a maximum width 95%CI of 2%.

You can use simulation; maybe something like the following.

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1591764")'

.ÿ
.ÿprogramÿdefineÿsimem,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ16.1
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿn(integer)ÿ[Pi(realÿ0.05)ÿWidth(realÿ0.02)]
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿlocalÿsuccessÿ=ÿrbinomial(`n',ÿ`pi')
ÿÿ4.ÿÿÿÿÿÿÿÿÿciiÿproportionsÿ`n'ÿ`success',ÿwilson
ÿÿ5.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿposÿ=ÿabs(r(ub)ÿ-ÿr(lb))ÿ<=ÿ`width'
ÿÿ6.ÿend

.ÿ
.ÿforvaluesÿnÿ=ÿ1900(50)2100ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿquietlyÿsimulateÿposÿ=ÿr(pos),ÿreps(3000)ÿnodots:ÿsimemÿ,ÿn(`n')
ÿÿ3.ÿÿÿÿÿÿÿÿÿsummarizeÿpos,ÿmeanonly
ÿÿ4.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Nÿ=ÿ`n'ÿPowerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿÿ5.ÿ}
Nÿ=ÿ1900ÿPowerÿ=ÿ0.65
Nÿ=ÿ1950ÿPowerÿ=ÿ0.74
Nÿ=ÿ2000ÿPowerÿ=ÿ0.83
Nÿ=ÿ2050ÿPowerÿ=ÿ0.90
Nÿ=ÿ2100ÿPowerÿ=ÿ0.95

.ÿ
.ÿexit

endÿofÿdo-file

.

Originally posted by Rich Goldstein View Post

I thought that there was an article definitely about use of t-tests for proportions but I can't offhand find it

Yeah, I remember running across that, too, but I recall that it was some kind of obiter dictum in the introdution or discussion section of some paper, maybe, Agresti and Coull or Agresti and Caffo? I don't have either handy at the moment and so cannot check to be sure.

A. Agresti & B. A. Coull, Approximate is better than 'exact' for interval estimation of binomial proportions. _The American Statistician_ 52:119–26, 1998.

A. Agresti & B. Caffo, Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. _The American Statistician_ 54:280-88, 2000.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#6

29 Jan 2021, 07:23

Joseph Coveney Hi and thank you for the citations;

based on a very quick look, I think that the earlier one is relevant but the actual claim is that the normal-theory Wald approximation gives CI's that are too narrow (while the one based on the "exact" binomial is too wide) and they provide a different approximation that they like

Best,
Rich
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2402

29 Jan 2021, 10:48

I thought this was interesting and I found that the Stata manual leaves this as an exercise for the reader to try to implement their own -ciwidth- methods. You may find it through the PDF document for -help ciwidth usermethod- and going to "More examples: Compute probability of CI width for a oneproportion
CI".

Taking the code presented in the manual and the example given by Joseph, you obtain the same results but can conveniently use the -ciwidth- facility to say get a sample size graph at the end.

Code:

clear *
cls

version 16.1
set seed `=strreverse("1591764")'

program myonepropsim, rclass
  version 16.1
  args n p level wilson
  clear
  set obs `n'
  generate byte y = rbinomial(1, `p')
  ci proportions y, level(`level') wilson
  return scalar w = r(ub)-r(lb)
end

program ciwidth_cmd_myonepropsim_init, sclass
  version 16.1
  sreturn clear
  sreturn local prss_argnames = "p"
  sreturn local prss_colnames = "p"
  sreturn local prss_subtitle = "Two-sided Wilson CI"
end

program ciwidth_cmd_myonepropsim, rclass
  version 16.1
  /* parse command arguments and options */
  syntax  anything(name=p), /// proportion estimate
          n(integer) /// sample size
          Width(real) /// target CI width
          [ Level(cilevel) /// confidence level
          reps(integer 100) wilson qui ]
  /* compute probability of CI width using simulation */
  display as txt _n "Computing Pr(width) for n=`n' and width=`w' ..."
  `qui' simulate w=r(w), reps(`reps'): myonepropsim `n' `p' `level'
  qui count if w <= `width'
  /* store results */
  return scalar Pr_width = r(N)/`reps'
  return scalar level = `level'
  return scalar N = `n'
  return scalar width = `width'
  return scalar p = `p'
end

ciwidth myonepropsim 0.05, n(1900(50)2100) reps(3000) width(0.02) qui table graph

And results in:

Code:

  +------------------------------------------+
  |   level       N Pr_width   width       p |
  |------------------------------------------|
  |      95   1,900    .6473     .02     .05 |
  |      95   1,950     .735     .02     .05 |
  |      95   2,000    .8383     .02     .05 |
  |      95   2,050    .9057     .02     .05 |
  |      95   2,100    .9577     .02     .05 |
  +------------------------------------------+

Click image for larger version

Name: sample_size_graph.jpg
Views: 1
Size: 26.6 KB
ID: 1591991

Comment

Inaamul Haq

Join Date: Feb 2019

Posts: 57
#8

08 Feb 2023, 19:15

How about

Code:

local sd = sqrt(0.05*0.95) ciwidth onemean, width(0.02) sd(`sd') knownsd

The commonly used formula for sample size calculation for estimating a prevalence is

p=z²*p(1-p)/d²

The

Code:

ciwidth onemean

command works the same way when we use the knownsd option and calculate sd as

sqrt(p(1-p))
Comment

Announcement