Crossover trial and sample size calculations

Laura Myles

Join Date: Jun 2018

Posts: 153
#1

Crossover trial and sample size calculations

21 Jun 2022, 10:40

Hi Listers,

I need to calculate the sample size for a 2x2 crossover study, where the order of treatment received is randomised (1:1 allocation) so that half of the participants receive treatment A then B while the other half receive treatment B then A.

We are comparing 2 different diets and are expecting under diet#1 participants will put on 2kg (compared to baseline) while they will lose 2kg under diet#2 (there will be a washout period when we expect them to return to their initial weight). We assume that the standard deviation difference is 3.

I have used the -power pairedmeans- command and this suggests that 9 participants are needed to achieve >90% power with alpha =0.05 (2-tailed):

power pairedmeans -2 2, sddiff(3) power(.9)

I get similar results using -xsampsi-

xsampsi, alpha(0.05) beta(0.1) n(6(1)12) delta(4) stddev(3)

However, based on an earlier posts on the forum which suggests analyses for this type of study should not rely on paired t-test but mixed models, I decided to run some simulations to estimate the needed sample so to reflect the planned analysis.

I am new to this so I was hoping to get some input on whether this is the correct approach- would compare weight at the end of each experimental period while adjusting for baseline scores be a more appropriate approach than using change from baseline at the end of each experimental period?

Code:

program letsample, rclass version 16.0 syntax, n(integer) /// [ alpha(real 0.05) /// m1(real 1) /// m2(real 1) /// sd1(real 1) /// sd2(real 1) /// ] clear set obs 1 expand `n' * Create sequence variable: 0 for treatment A first vs. 1 for treatment B local mid = round(`n'/2,1) local mid2 = `mid'+1 di `mid' di `mid2' g seq= 0 in 1/`mid' replace seq= 1 in `mid2'/`n' g nid=_n * Create difference scores g scores1 = rnormal(`m1', `sd1') g scores2 = rnormal(`m2', `sd2') reshape long scores , i(n) j(treat) mixed scores i.seq##i.treat || nid: return scalar pos = r(table)["pvalue", "scores:2.treat"] < `alpha' end simulate reject = r(pos), reps(1000) seed(73450): /// letsample, n(12) m1(-2) m2(2) sd1(2.3) sd2(2.3) tab reject

Last edited by Laura Myles; 21 Jun 2022, 10:56.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

22 Jun 2022, 19:48

Originally posted by Laura Myles View Post

I decided to run some simulations to estimate the needed sample so to reflect the planned analysis.

I am new to this so I was hoping to get some input on whether this is the correct approach

Wouldn't your planned analysis typically have not only sequence and treatment predictors, but also a period predictor? As far as your simulations' reflecting your planned analysis, I don't see this latter predictor in your simulation program. (It's only a technical observation; given your assumptions, its absence won't affect your sample size estimates.) [Edited: Is it there as the interaction term?]

would compare weight at the end of each experimental period while adjusting for baseline scores be a more appropriate approach than using change from baseline at the end of each experimental period?

You're aware of the literature about simple analysis of change scores versus ANCOVA, especially in the context of Lord's paradox (speaking of diet). It seems as if it could readily be extended to a crossover-trial situation. If it's a concern, then in your power analysis simulations, you could explore to what extent (intraclass) correlation differentially affects the power of each approach.

(there will be a washout period when we expect them to return to their initial weight)

Hmm. It's one thing in, say, a short-term bioequivalence study of a drug, where absent something like enzyme induction one can make the case not to expect a carryover or sequence or period effect. But in a body-weight-changing diet trial, it strikes me as more of a stretch to make the assumption that the participant who has undertaken the first diet (and not dropped out), and who has undergone a body weight excursion, is the same participant as at the beginning of the first period. I'm not sure how much it matters or what you can do about it, but perhaps there's some way that you could explore it in your simulations as a sensitivity analysis.

Last edited by Joseph Coveney; 22 Jun 2022, 19:52.
1 like
Comment

Laura Myles

Join Date: Jun 2018
Posts: 153

24 Jun 2022, 08:34

Thanks Joseph Coveney for your considerations.

I had included a variable called period (coded 1 or 2; like treatment). When entering the period variable in the model, I omitted it in my previous code as it did not add to the model so I excluded it from my final ado file and instead included treatment, sequence (order of treatments), and the interaction treatment*sequence.

Yes, I agree that with weight measures it may be best to adjust for the baseline weight measure rather than using change from baseline scores. I am unsure, however, how to ensure how to capture in the model the expected change in weight under each treatment (i.e. -2 vs. +2 kg under the interventions) - is the approach below OK?

I now extract the ICC- it is smaller when adjusting for baseline scores, but should I attempt manipulate it?

Code:

capture program drop letsample

program letsample, rclass
    version 16.0
 
    syntax, n(integer)          ///  
          [ alpha(real 0.05)    ///  
            m0(real 1)            ///
            m1(real 1)          ///  
            m2(real 1)          /// 
            sd0(real 1)         ///
            sd1(real 1)         ///   
            sd2(real 1)  ///
            ]

clear
set obs 1

* Create period variable (time 1 vs. time 2)
g period1=1
g period2=2

expand `n'

* Create sequence variable: 0 for treatment A first vs. 1 for treatment B
local mid = round(`n'/2,1)
local mid2 = `mid'+1
di `mid'
di `mid2'
g seq= 0 in 1/`mid'
replace seq= 1 in `mid2'/`n'

g nid=_n

* Create difference scores
g bl_scores = rnormal(`m0', `sd0') 
g scores1 = rnormal(`m1', `sd1') 
g scores2 = rnormal(`m2', `sd2') 

reshape long scores period, i(n) j(treat)

mixed scores i.seq##i.treat bl_scores || nid:

    return scalar pos = r(table)["pvalue", "scores:2.treat"] < `alpha'
    qui: estat icc
    return scalar rho = r(icc2)
end

*
simulate reject = r(pos) rho=r(rho), reps(500) seed(73450): ///
letsample, n(20) m0(80) m1(78) m2(82) sd0(3) sd1(3) sd2(3)

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

24 Jun 2022, 22:03

Originally posted by Laura Myles View Post

I am unsure, however, how to ensure how to capture in the model the expected change in weight under each treatment (i.e. -2 vs. +2 kg under the interventions) - is the approach below OK?

The approach you show doesn't really maintain your assumption that the difference scores will have an SD of 3.

Maybe try something like below. The return scalars are the positive test results for three model specifications: an ANCOVA-like model (the one that you're using above, ancova), a simple analysis of change scores (sacs) and a repeated-measures ANOVA-like configuration (rmanova). From some literature, I took the SD for baseline body weight as 9% of the mean and returned confirmation of that (sd0) as well as confirmation that the SD of the change scores is 3 (sdd).

I looked at operating characteristics under the null hypothesis as well as under the alternative hypothesis with a sample size that gives about 90% power.

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿ//ÿseedem
.ÿsetÿseedÿ928188852

.ÿ
.ÿprogramÿdefineÿsimem,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[n(integerÿ20)ÿSU(realÿ7)ÿSE(realÿ2.2)ÿBasewgt(realÿ80)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDelta(realÿ2)ÿAlpha(realÿ0.05)ÿnoBALance]
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿdropÿ_all
ÿÿ4.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿParticipants
.ÿÿÿÿÿÿÿÿÿifÿ"`balance'"ÿ!=ÿ""ÿlocalÿNÿ`n'
ÿÿ5.ÿÿÿÿÿÿÿÿÿelseÿlocalÿNÿ=ÿ`n'ÿ-ÿmod(`n',ÿ2)
ÿÿ6.ÿÿÿÿÿÿÿÿÿsetÿobsÿ`N'
ÿÿ7.ÿ
.ÿÿÿÿÿÿÿÿÿgenerateÿlongÿpidÿ=ÿ_n
ÿÿ8.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿpid_uÿ=ÿrnormal(0,ÿ`su')
ÿÿ9.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿSequences
.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿseqÿ=ÿ!mod(_n,ÿ2)
ÿ10.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿPeriods
.ÿÿÿÿÿÿÿÿÿexpandÿ2
ÿ11.ÿÿÿÿÿÿÿÿÿbysortÿpid:ÿgenerateÿbyteÿperÿ=ÿ_nÿ-ÿ1
ÿ12.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿTreatments
.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿcond(seq,ÿ!per,ÿper)
ÿ13.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿOutcomes,ÿbaselineÿandÿposttreatment
.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿout0ÿ=ÿrnormal(`basewgt'ÿ+ÿpid_u,ÿ`se')
ÿ14.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿout1ÿ=ÿrnormal(`basewgt'ÿ+ÿcond(trt,ÿ-`delta',ÿ`delta')ÿ+ÿpid_u,ÿ`se')
ÿ15.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿConfirmation
.ÿÿÿÿÿÿÿÿÿsummarizeÿout0ÿifÿ!per
ÿ16.ÿÿÿÿÿÿÿÿÿtempnameÿsd0
ÿ17.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`sd0'ÿ=ÿr(sd)
ÿ18.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿdÿ=ÿout0ÿ-ÿout1
ÿ19.ÿÿÿÿÿÿÿÿÿsummarizeÿdÿifÿ!trt
ÿ20.ÿÿÿÿÿÿÿÿÿtempnameÿsdd
ÿ21.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`sdd'ÿ=ÿr(sd)
ÿ22.ÿ
.ÿÿÿÿÿÿÿÿÿmixedÿout1ÿi.seqÿi.perÿi.trtÿc.out0ÿ||ÿpid:ÿ,ÿremlÿdfmethod(satterthwaite)
ÿ23.ÿÿÿÿÿÿÿÿÿtempnameÿancova
ÿ24.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ancova'ÿ=ÿr(table)["pvalue",ÿ"out1:1.trt"]ÿ<ÿ`alpha'
ÿ25.ÿ
.ÿÿÿÿÿÿÿÿÿmixedÿdÿi.seqÿi.perÿi.trtÿ||ÿpid:ÿ,ÿremlÿdfmethod(satterthwaite)
ÿ26.ÿÿÿÿÿÿÿÿÿtempnameÿsacs
ÿ27.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`sacs'ÿ=ÿr(table)["pvalue",ÿ"d:1.trt"]ÿ<ÿ`alpha'
ÿ28.ÿ
.ÿÿÿÿÿÿÿÿÿreshapeÿlongÿout,ÿi(pidÿper)ÿj(tim)
ÿ29.ÿÿÿÿÿÿÿÿÿmixedÿoutÿi.seqÿi.perÿi.trt##i.timÿ||ÿpid:ÿ,ÿremlÿdfmethod(satterthwaite)
ÿ30.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿrmanovaÿ=ÿr(table)["pvalue",ÿ"out:1.trt#1.tim"]ÿ<ÿ`alpha'
ÿ31.ÿ
.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿancovaÿ=ÿ`ancova'
ÿ32.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿsacsÿ=ÿ`sacs'
ÿ33.ÿ
.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿnÿ=ÿ`N'
ÿ34.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿsd0ÿ=ÿ`sd0'
ÿ35.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿsddÿ=ÿ`sdd'
ÿ36.ÿend

.ÿ
.ÿprogramÿdefineÿsumem
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntax
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿformatÿancovaÿsacsÿrmanovaÿ%05.3f
ÿÿ4.ÿÿÿÿÿÿÿÿÿformatÿsd?ÿ%3.1f
ÿÿ5.ÿÿÿÿÿÿÿÿÿsummarizeÿ,ÿformat
ÿÿ6.ÿend

.ÿ
.ÿ//ÿH0:
.ÿquietlyÿsimulateÿancovaÿ=ÿr(ancova)ÿsacsÿ=ÿr(sacs)ÿrmanovaÿ=ÿr(rmanova)ÿ///
>ÿÿÿÿÿÿÿÿÿsd0ÿ=ÿr(sd0)ÿsddÿ=ÿr(sdd),ÿreps(1000):ÿsimemÿ,ÿn(13)ÿd(0)ÿnobalance

.ÿsumem

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿÿancovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.066ÿÿÿÿÿÿÿ0.248ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿsacsÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.065ÿÿÿÿÿÿÿ0.247ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿrmanovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.060ÿÿÿÿÿÿÿ0.238ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿÿsd0ÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ7.2ÿÿÿÿÿÿÿÿÿ1.4ÿÿÿÿÿÿÿÿ3.3ÿÿÿÿÿÿÿ11.7
ÿÿÿÿÿÿÿÿÿsddÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ3.0ÿÿÿÿÿÿÿÿÿ0.6ÿÿÿÿÿÿÿÿ1.1ÿÿÿÿÿÿÿÿ5.1

.ÿ
.ÿ//ÿHA:
.ÿquietlyÿsimulateÿancovaÿ=ÿr(ancova)ÿsacsÿ=ÿr(sacs)ÿrmanovaÿ=ÿr(rmanova)ÿ///
>ÿÿÿÿÿÿÿÿÿsd0ÿ=ÿr(sd0)ÿsddÿ=ÿr(sdd),ÿreps(1000):ÿsimemÿ,ÿn(13)ÿnobalance

.ÿsumem

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿÿancovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.894ÿÿÿÿÿÿÿ0.308ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿsacsÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.864ÿÿÿÿÿÿÿ0.343ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿrmanovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.890ÿÿÿÿÿÿÿ0.313ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿÿsd0ÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ7.2ÿÿÿÿÿÿÿÿÿ1.5ÿÿÿÿÿÿÿÿ3.0ÿÿÿÿÿÿÿ12.4
ÿÿÿÿÿÿÿÿÿsddÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ3.0ÿÿÿÿÿÿÿÿÿ0.6ÿÿÿÿÿÿÿÿ1.4ÿÿÿÿÿÿÿÿ5.3

.ÿ
.ÿexit

endÿofÿdo-file

.

Here, it looks as if overall the repeated-measures ANOVA-like setup has a slight edge over the other two. Nevertheless, if it were me, I'd probably still not favor it, because it assumed compound symmetry for the errors of before-and-after body weights, and I'm pretty certain that the variance of the posttreatment body weights will be affected (increased) by the dietary interventions.
1 like
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#5

27 Jun 2022, 08:27

Thanks Joseph Coveney , you do file is very sophisticated! I

Is SU defining the standard deviation for baseline scores and what does the "pid_u" code. I am also unsure how you make sure that the sdd is 3.

You wrote

Here, it looks as if overall the repeated-measures ANOVA-like setup has a slight edge over the other two.

Based on the HA tabulation, I would have said the ANCOVA approach had an edge; what am I missing?

Last edited by Laura Myles; 27 Jun 2022, 08:33.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#6

27 Jun 2022, 17:28

Originally posted by Laura Myles View Post

Is SU defining the standard deviation for baseline scores

No, it's the standard deviation of the participant random effect.

and what does the "pid_u" code.

It is the random effect of participant. A mixed model needs this.

I am also unsure how you make sure that the sdd is 3.

It's a function of both SU (standard deviation of the participant random effect) and SE (standard deviation of the errors).

Based on the HA tabulation, I would have said the ANCOVA approach had an edge; what am I missing?

The repeated-measures ANOVA-like setup holds the test size better. The apparent slightly higher power for the ANCOVA-like arrangement (89.4% versus 89.0%) is basically attributable to its relative inability to maintain test size (6.6% versus 6.0%) in these circumstances.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#7

29 Jun 2022, 04:25

Thanks again Joseph Coveney
Comment

Announcement