How to Calculate Sample Size for 3 Independent Samples

David Martin

Join Date: Nov 2020

Posts: 6
#1

How to Calculate Sample Size for 3 Independent Samples

04 Nov 2020, 15:11

Hi All,

I wish to calculate the necessary sample size for the below experiment which will consist of 3 groups:
Control Group
Treatment Group 1
Treatment Group 2

Each group will be shown an advertisement and then asked to fill in a survey of how likely they are to change their behavior due to seeing that advertisement.

My understanding is that I have the option in Stata to follow the below path to calculate the sample size for 2 independent samples.
-Power and Sample Size Analysis
--Population Parameter
---Proportions
----Two Independent Samples

However, I believe I need to calculate the sample size for 3 independent samples.

Can anyone guide me on how exactly I can do that in Stata?

Thanks In Advance For All Assistance
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

04 Nov 2020, 16:29

First, you'd start with the hypothesis pair or set of hypothesis pairs that you're intending to test, maybe all pairwise or maybe each experimental treatment group versus control treatment group. You'd then power the set of tests to detect something like either (i) at least one difference in the set or (ii) the smallest difference in the set, whichever you're after (or maybe you're looking for a particular pattern, e.g., control treatment group < experimental treatment group 1 ≤ experimental treatment group 2?).
1 like
Comment
David Martin

Join Date: Nov 2020

Posts: 6
#3

05 Nov 2020, 09:22

The hypothesis is:

We expect the effect on:
Treatment Group 2
To be greater than the effect on:
Treatment Group 1
To be greater than the effect on:
The Control Group

I now need to figure out how to calculate the necessary sample size of Stata. Any guidance would be greatly appreciated.

(Again, Thanks In Advance)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

05 Nov 2020, 10:00

You may take a look at - power oneway - command.

Best regards,

Marcos
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#5

05 Nov 2020, 20:58

Originally posted by David Martin View Post

We expect the effect on:
Treatment Group 2
To be greater than the effect on:
Treatment Group 1
To be greater than the effect on:
The Control Group

. . . Any guidance would be greatly appreciated.

You're going to need more rigor, less ambiguity, in specifying the null and alternative hypothesis pair (or pairs).

You asked for guidance. I recommend that you fire up your favorite search engine and look for a formal protocol or so-called statistical analysis plan (SAP) for a government-regulated clinical or nonclinical study—perhaps something publicly available that was used or intended to be used for submission to the U.S. Food and Drug Administration (U.S. FDA)—and see how NHST and associated power analysis & sample size estimation are described there. You'll see Greek alphabetical characters that refer to population parameters whose differences are inferred from results of the NHST done on the data observed for each treatment group (experimental or control) using a statistical method that is specified explicitly.

What you've shown is more a list of informal expectations and not something that is formally amenable to a statistical procedure, for example, you mention "the effect on" each treatment group—does this mean that there is a before and after for each treatment group and you're assessing whether the change in outcome differs between them in repeated-measurements after some kind of manipulation that is applied to all participants? If so, are you going to use change scores in before-and-after proportions, look at a treatment × time interaction in a longitudinal analysis of, say, the log-odds?

Also, what exactly is your criterion? What's its formulation, for example, is it something like a set of individual between-group comparisons?
H₀: π₂ < π₁ < π_C
H_A: π₂ ≥ π₁ ≥ π_C, with at least one inequality strict
Or is it more like that the linear component of the set of orthogonal polynomial contrasts (say, via a postestimation command using Stata's factor variable notation in a regression model) is greater than zero versus the null of less than or equal to zero)?
Comment
David Martin

Join Date: Nov 2020

Posts: 6
#6

06 Nov 2020, 10:45

Is it possible our answer lies under the following command/procedure:

power oneway, varmeans(X) ngroups(3) grweights(1 1 1)

We know we have 3 groups and we want them weighted equally.

We don't know the "varmeans" (Between-group variance) or the "Error (within-group) variance"

All we know at the moment is we expect Treatment Group 2 to produce higher numbers than Treatment Group 1 to produce higher numbers than Control Group. (In terms of 7 being "Very Likely to Change Behaviour" and 1 being "Very Unlikely to Change Behaviour")

Again, any guidance will be greatly appreciated. (Thanks In Advance) (Joseph, I am still pondering your answer)

Last edited by David Martin; 06 Nov 2020, 10:55.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#7

07 Nov 2020, 02:52

You're probably going to want the

Code:

power oneway meanspec, options

syntax of that command. Again,it's going to give you the sample size for an omnibus test, which might not be what you want.

I've gone ahead and performed an abbreviated power analysis at two candidate sample sizes for ability to detect both a difference between the first experimental treatment group and control treatment group and a difference between the two experimental treatment groups. So, the hypothesis pair is
H₀: π_C = π₁ = π₂
H_A: π_C ≠ π₁ AND π₁ ≠ π₂
Because your outcome measure is an ordered-categorical score, I've elected to use ordinal logistic regression to test the hypothesis, and to power the study to detect an odds ratio (OR) of 2 for both control treatment versus first experimental treatment and first experimental treatment versus second experimental treatment. An OR of 2 was chosen as the minimum detectable difference on the basis of its rule-of-thumb use as what's considered an important effect size in the absence of subject matter considerations.

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1580616")'

.ÿ
.ÿprogramÿdefineÿsimem,ÿrclass
ÿÿ1.ÿÿÿÿÿversionÿ16.1
ÿÿ2.ÿÿÿÿÿsyntaxÿ,ÿ[n(integerÿ250)]
ÿÿ3.ÿ
.ÿÿÿÿÿdropÿ_all
ÿÿ4.ÿÿÿÿÿsetÿobsÿ`=round(`n',ÿ3)'
ÿÿ5.ÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿmod(_n,ÿ3)
ÿÿ6.ÿ
.ÿÿÿÿÿlocalÿcut_list
ÿÿ7.ÿÿÿÿÿforvaluesÿcutÿ=ÿ1/6ÿ{
ÿÿ8.ÿÿÿÿÿÿÿÿÿlocalÿcut_listÿ`cut_list'ÿ`=logit(`cut'/7)'
ÿÿ9.ÿÿÿÿÿ}
ÿ10.ÿÿÿÿÿgenerateÿdoubleÿxbÿ=ÿcond(trtÿ>ÿ0,ÿln(2ÿ*ÿtrt),ÿtrt)
ÿ11.ÿÿÿÿÿgrologitÿxb,ÿgenerate(sco)ÿcuts(`cut_list')
ÿ12.ÿ
.ÿÿÿÿÿologitÿscoÿi.trt
ÿ13.ÿ
.ÿÿÿÿÿtempnameÿor1ÿor2
ÿ14.ÿÿÿÿÿscalarÿdefineÿ`or1'ÿ=ÿexp(_b[1.trt])
ÿ15.ÿÿÿÿÿscalarÿdefineÿ`or2'ÿ=ÿexp(_b[2.trt]ÿ-ÿ_b[1.trt])
ÿ16.ÿ
.ÿÿÿÿÿtempnameÿgrp1
ÿ17.ÿÿÿÿÿtestÿ1.trt
ÿ18.ÿÿÿÿÿscalarÿdefineÿ`grp1'ÿ=ÿr(p)
ÿ19.ÿ
.ÿÿÿÿÿtestÿ2.trtÿ=ÿ1.trt
ÿ20.ÿÿÿÿÿreturnÿscalarÿgrp2ÿ=ÿr(p)
ÿ21.ÿÿÿÿÿreturnÿscalarÿgrp1ÿ=ÿ`grp1'
ÿ22.ÿÿÿÿÿreturnÿscalarÿor1ÿ=ÿ`or1'
ÿ23.ÿÿÿÿÿreturnÿscalarÿor2ÿ=ÿ`or2'
ÿ24.ÿend

.ÿ
.ÿforvaluesÿnÿ=ÿ450(100)550ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿNÿ=ÿround(`n',ÿ3)
ÿÿ3.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(1)ÿ"Nÿ=ÿ"ÿ`N'ÿ"ÿ(nÿ=ÿ`=`N'ÿ/ÿ3')"
ÿÿ4.ÿÿÿÿÿquietlyÿsimulateÿor1ÿ=ÿr(or1)ÿor2ÿ=ÿr(or2)ÿgrp1ÿ=ÿr(grp1)ÿgrp2ÿ=ÿr(grp2),ÿ///
>ÿÿÿÿÿÿÿÿÿreps(1000)ÿnodots:ÿsimemÿ,ÿn(`n')
ÿÿ5.ÿÿÿÿÿsummarizeÿor1,ÿmeanonly
ÿÿ6.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"ORÿgroupÿ1ÿvÿgroupÿ0ÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿÿ7.ÿÿÿÿÿsummarizeÿor2,ÿmeanonly
ÿÿ8.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"ORÿgroupÿ2ÿvÿgroupÿ1ÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿÿ9.ÿÿÿÿÿgenerateÿbyteÿposÿ=ÿgrp1ÿ<ÿ0.05ÿ&ÿgrp2ÿ<ÿ0.05
ÿ10.ÿÿÿÿÿquietlyÿreplaceÿposÿ=ÿ.ÿifÿmi(grp1,ÿgrp2)
ÿ11.ÿÿÿÿÿsummarizeÿpos,ÿmeanonly
ÿ12.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Powerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿ13.ÿ}

Nÿ=ÿ450ÿ(nÿ=ÿ150)
ORÿgroupÿ1ÿvÿgroupÿ0ÿ=ÿ2.03
ORÿgroupÿ2ÿvÿgroupÿ1ÿ=ÿ2.05
Powerÿ=ÿ0.85

Nÿ=ÿ549ÿ(nÿ=ÿ183)
ORÿgroupÿ1ÿvÿgroupÿ0ÿ=ÿ2.04
ORÿgroupÿ2ÿvÿgroupÿ1ÿ=ÿ2.06
Powerÿ=ÿ0.92

.ÿ
.ÿexit

endÿofÿdo-file

.

The sample sizes are large in order to obtain 90% power, but the alternative hypothesis is pretty strict, demanding not just one but two differences be detected. An alternative set up, such as use of a joint-test result from a Helmert contrast or a test of the linear component of a set of orthogonal polynomial contrasts might give much reduced sample size requirements than the strict requirement that I've imposed above.

Without saying so, you seem to want to use a linear model like the ANOVA that power oneway is for. With seven ordered categories, that might be justifiable and, provided that the minimum detected means aren't set too close, that approach is also likely to give smaller sample sizes than that above.
Attached Files

grologit.ado (1.9 KB, 8 views)
1 like
Comment
David Martin

Join Date: Nov 2020

Posts: 6
#8

07 Nov 2020, 13:07

From a similar study, we believe the correct sample size should be approximately 500.

I have constructed the below command to give that answer:

power oneway, varerror(51.5) varmeans(1) ngroups(3) grweights(1 1 1)

Any advice on how I can justify the varerror and the varmeans figures (The varerror figure appears to have to be 51.5 times the varmeans figure to produce a 500 sample)

Again, thanks in advance (Joseph, I am still pondering your answers)
Comment
David Martin

Join Date: Nov 2020

Posts: 6
#9

09 Nov 2020, 08:14

Originally posted by Joseph Coveney View Post

Nÿ=ÿ450ÿ(nÿ=ÿ150)
ORÿgroupÿ1ÿvÿgroupÿ0ÿ=ÿ2.03
ORÿgroupÿ2ÿvÿgroupÿ1ÿ=ÿ2.05
Powerÿ=ÿ0.85

Nÿ=ÿ549ÿ(nÿ=ÿ183)
ORÿgroupÿ1ÿvÿgroupÿ0ÿ=ÿ2.04
ORÿgroupÿ2ÿvÿgroupÿ1ÿ=ÿ2.06
Powerÿ=ÿ0.92

.ÿ
.ÿexit

endÿofÿdo-file

.[/font]

Hi Joseph, Is it possible that you could direct me to your 450 and 549 conclusions through the drop down screen option instead. (eg inputting figures into the power oneway screen or some other screen). Again, thanks in advance.

Last edited by David Martin; 09 Nov 2020, 08:18.
Comment
David Martin

Join Date: Nov 2020

Posts: 6
#10

09 Nov 2020, 11:22

To rephrase my entire query, I believe the total sample size should be approx 150 (3 groups of 50)

Can anyone give me a quick fix on how to arrive at this figure using the "sampsi" function (ie can I throw some rule of thumb figures into this function to arrive at 150).

Thanks In Advance
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#11

09 Nov 2020, 19:04

Originally posted by David Martin View Post

Is it possible that you could direct me to your 450 and 549 conclusions through the drop down screen option instead. (eg inputting figures into the power oneway screen or some other screen).

I only rarely use Stata's menu, and then it's mostly for import because I'm too lazy to type in the full path of the file.

Even so, I'm not sure that what is available from the menu is flexible enough to accommodate the kind of alternative hypothesis that I imposed above.

Originally posted by David Martin View Post

Can anyone give me a quick fix on how to arrive at this figure using the "sampsi" function (ie can I throw some rule of thumb figures into this function to arrive at 150).

I don't know; maybe you can try the Guernsey McPearson Clinical Relevant Difference rule-of-thumb figure.
Comment

Announcement

How to Calculate Sample Size for 3 Independent Samples

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment