Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossover trial and sample size calculations

    Hi Listers,

    I need to calculate the sample size for a 2x2 crossover study, where the order of treatment received is randomised (1:1 allocation) so that half of the participants receive treatment A then B while the other half receive treatment B then A.

    We are comparing 2 different diets and are expecting under diet#1 participants will put on 2kg (compared to baseline) while they will lose 2kg under diet#2 (there will be a washout period when we expect them to return to their initial weight). We assume that the standard deviation difference is 3.

    I have used the -power pairedmeans- command and this suggests that 9 participants are needed to achieve >90% power with alpha =0.05 (2-tailed):

    power pairedmeans -2 2, sddiff(3) power(.9)

    I get similar results using -xsampsi-

    xsampsi, alpha(0.05) beta(0.1) n(6(1)12) delta(4) stddev(3)

    However, based on an earlier posts on the forum which suggests analyses for this type of study should not rely on paired t-test but mixed models, I decided to run some simulations to estimate the needed sample so to reflect the planned analysis.

    I am new to this so I was hoping to get some input on whether this is the correct approach- would compare weight at the end of each experimental period while adjusting for baseline scores be a more appropriate approach than using change from baseline at the end of each experimental period?


    Code:
    program letsample, rclass
        version 16.0
     
        syntax, n(integer)          ///  
              [ alpha(real 0.05)    ///  
                m1(real 1)          ///  
                m2(real 1)          ///  
                sd1(real 1)         ///  
                sd2(real 1)  ///
                ]
    
    clear
    set obs 1
    
    expand `n'
    
    * Create sequence variable: 0 for treatment A first vs. 1 for treatment B
    local mid = round(`n'/2,1)
    local mid2 = `mid'+1
    di `mid'
    di `mid2'
    g seq= 0 in 1/`mid'
    replace seq= 1 in `mid2'/`n'
    
    g nid=_n
    
    * Create difference scores
    g scores1 = rnormal(`m1', `sd1')
    g scores2 = rnormal(`m2', `sd2')
    
    reshape long scores , i(n) j(treat)
    
    mixed scores i.seq##i.treat  || nid:
    
    return scalar pos = r(table)["pvalue", "scores:2.treat"] < `alpha'
    
    end
    
    simulate reject = r(pos), reps(1000) seed(73450): ///
    letsample, n(12) m1(-2) m2(2) sd1(2.3) sd2(2.3)
    
    tab reject
    Last edited by Laura Myles; 21 Jun 2022, 10:56.

  • #2
    Originally posted by Laura Myles View Post
    I decided to run some simulations to estimate the needed sample so to reflect the planned analysis.

    I am new to this so I was hoping to get some input on whether this is the correct approach
    Wouldn't your planned analysis typically have not only sequence and treatment predictors, but also a period predictor? As far as your simulations' reflecting your planned analysis, I don't see this latter predictor in your simulation program. (It's only a technical observation; given your assumptions, its absence won't affect your sample size estimates.) [Edited: Is it there as the interaction term?]

    would compare weight at the end of each experimental period while adjusting for baseline scores be a more appropriate approach than using change from baseline at the end of each experimental period?
    You're aware of the literature about simple analysis of change scores versus ANCOVA, especially in the context of Lord's paradox (speaking of diet). It seems as if it could readily be extended to a crossover-trial situation. If it's a concern, then in your power analysis simulations, you could explore to what extent (intraclass) correlation differentially affects the power of each approach.

    (there will be a washout period when we expect them to return to their initial weight)
    Hmm. It's one thing in, say, a short-term bioequivalence study of a drug, where absent something like enzyme induction one can make the case not to expect a carryover or sequence or period effect. But in a body-weight-changing diet trial, it strikes me as more of a stretch to make the assumption that the participant who has undertaken the first diet (and not dropped out), and who has undergone a body weight excursion, is the same participant as at the beginning of the first period. I'm not sure how much it matters or what you can do about it, but perhaps there's some way that you could explore it in your simulations as a sensitivity analysis.
    Last edited by Joseph Coveney; 22 Jun 2022, 19:52.

    Comment


    • #3
      Thanks Joseph Coveney for your considerations.

      I had included a variable called period (coded 1 or 2; like treatment). When entering the period variable in the model, I omitted it in my previous code as it did not add to the model so I excluded it from my final ado file and instead included treatment, sequence (order of treatments), and the interaction treatment*sequence.

      Yes, I agree that with weight measures it may be best to adjust for the baseline weight measure rather than using change from baseline scores. I am unsure, however, how to ensure how to capture in the model the expected change in weight under each treatment (i.e. -2 vs. +2 kg under the interventions) - is the approach below OK?

      I now extract the ICC- it is smaller when adjusting for baseline scores, but should I attempt manipulate it?

      Code:
      capture program drop letsample
      
      program letsample, rclass
          version 16.0
       
          syntax, n(integer)          ///  
                [ alpha(real 0.05)    ///  
                  m0(real 1)            ///
                  m1(real 1)          ///  
                  m2(real 1)          /// 
                  sd0(real 1)         ///
                  sd1(real 1)         ///   
                  sd2(real 1)  ///
                  ]
      
      clear
      set obs 1
      
      * Create period variable (time 1 vs. time 2)
      g period1=1
      g period2=2
      
      expand `n'
      
      * Create sequence variable: 0 for treatment A first vs. 1 for treatment B
      local mid = round(`n'/2,1)
      local mid2 = `mid'+1
      di `mid'
      di `mid2'
      g seq= 0 in 1/`mid'
      replace seq= 1 in `mid2'/`n'
      
      g nid=_n
      
      * Create difference scores
      g bl_scores = rnormal(`m0', `sd0') 
      g scores1 = rnormal(`m1', `sd1') 
      g scores2 = rnormal(`m2', `sd2') 
      
      reshape long scores period, i(n) j(treat)
      
      mixed scores i.seq##i.treat bl_scores || nid:
      
          return scalar pos = r(table)["pvalue", "scores:2.treat"] < `alpha'
          qui: estat icc
          return scalar rho = r(icc2)
      end
      
      *
      simulate reject = r(pos) rho=r(rho), reps(500) seed(73450): ///
      letsample, n(20) m0(80) m1(78) m2(82) sd0(3) sd1(3) sd2(3)

      Comment


      • #4
        Originally posted by Laura Myles View Post
        I am unsure, however, how to ensure how to capture in the model the expected change in weight under each treatment (i.e. -2 vs. +2 kg under the interventions) - is the approach below OK?
        The approach you show doesn't really maintain your assumption that the difference scores will have an SD of 3.

        Maybe try something like below. The return scalars are the positive test results for three model specifications: an ANCOVA-like model (the one that you're using above, ancova), a simple analysis of change scores (sacs) and a repeated-measures ANOVA-like configuration (rmanova). From some literature, I took the SD for baseline body weight as 9% of the mean and returned confirmation of that (sd0) as well as confirmation that the SD of the change scores is 3 (sdd).

        I looked at operating characteristics under the null hypothesis as well as under the alternative hypothesis with a sample size that gives about 90% power.

        .ÿ
        .ÿversionÿ17.0

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿ//ÿseedem
        .ÿsetÿseedÿ928188852

        .ÿ
        .ÿprogramÿdefineÿsimem,ÿrclass
        ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
        ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[n(integerÿ20)ÿSU(realÿ7)ÿSE(realÿ2.2)ÿBasewgt(realÿ80)ÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDelta(realÿ2)ÿAlpha(realÿ0.05)ÿnoBALance]
        ÿÿ3.ÿ
        .ÿÿÿÿÿÿÿÿÿdropÿ_all
        ÿÿ4.ÿ
        .ÿÿÿÿÿÿÿÿÿ//ÿParticipants
        .ÿÿÿÿÿÿÿÿÿifÿ"`balance'"ÿ!=ÿ""ÿlocalÿNÿ`n'
        ÿÿ5.ÿÿÿÿÿÿÿÿÿelseÿlocalÿNÿ=ÿ`n'ÿ-ÿmod(`n',ÿ2)
        ÿÿ6.ÿÿÿÿÿÿÿÿÿsetÿobsÿ`N'
        ÿÿ7.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿlongÿpidÿ=ÿ_n
        ÿÿ8.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿpid_uÿ=ÿrnormal(0,ÿ`su')
        ÿÿ9.ÿ
        .ÿÿÿÿÿÿÿÿÿ//ÿSequences
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿseqÿ=ÿ!mod(_n,ÿ2)
        ÿ10.ÿ
        .ÿÿÿÿÿÿÿÿÿ//ÿPeriods
        .ÿÿÿÿÿÿÿÿÿexpandÿ2
        ÿ11.ÿÿÿÿÿÿÿÿÿbysortÿpid:ÿgenerateÿbyteÿperÿ=ÿ_nÿ-ÿ1
        ÿ12.ÿ
        .ÿÿÿÿÿÿÿÿÿ//ÿTreatments
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿcond(seq,ÿ!per,ÿper)
        ÿ13.ÿ
        .ÿÿÿÿÿÿÿÿÿ//ÿOutcomes,ÿbaselineÿandÿposttreatment
        .ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿout0ÿ=ÿrnormal(`basewgt'ÿ+ÿpid_u,ÿ`se')
        ÿ14.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿout1ÿ=ÿrnormal(`basewgt'ÿ+ÿcond(trt,ÿ-`delta',ÿ`delta')ÿ+ÿpid_u,ÿ`se')
        ÿ15.ÿ
        .ÿÿÿÿÿÿÿÿÿ//ÿConfirmation
        .ÿÿÿÿÿÿÿÿÿsummarizeÿout0ÿifÿ!per
        ÿ16.ÿÿÿÿÿÿÿÿÿtempnameÿsd0
        ÿ17.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`sd0'ÿ=ÿr(sd)
        ÿ18.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿdÿ=ÿout0ÿ-ÿout1
        ÿ19.ÿÿÿÿÿÿÿÿÿsummarizeÿdÿifÿ!trt
        ÿ20.ÿÿÿÿÿÿÿÿÿtempnameÿsdd
        ÿ21.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`sdd'ÿ=ÿr(sd)
        ÿ22.ÿ
        .ÿÿÿÿÿÿÿÿÿmixedÿout1ÿi.seqÿi.perÿi.trtÿc.out0ÿ||ÿpid:ÿ,ÿremlÿdfmethod(satterthwaite)
        ÿ23.ÿÿÿÿÿÿÿÿÿtempnameÿancova
        ÿ24.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ancova'ÿ=ÿr(table)["pvalue",ÿ"out1:1.trt"]ÿ<ÿ`alpha'
        ÿ25.ÿ
        .ÿÿÿÿÿÿÿÿÿmixedÿdÿi.seqÿi.perÿi.trtÿ||ÿpid:ÿ,ÿremlÿdfmethod(satterthwaite)
        ÿ26.ÿÿÿÿÿÿÿÿÿtempnameÿsacs
        ÿ27.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`sacs'ÿ=ÿr(table)["pvalue",ÿ"d:1.trt"]ÿ<ÿ`alpha'
        ÿ28.ÿ
        .ÿÿÿÿÿÿÿÿÿreshapeÿlongÿout,ÿi(pidÿper)ÿj(tim)
        ÿ29.ÿÿÿÿÿÿÿÿÿmixedÿoutÿi.seqÿi.perÿi.trt##i.timÿ||ÿpid:ÿ,ÿremlÿdfmethod(satterthwaite)
        ÿ30.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿrmanovaÿ=ÿr(table)["pvalue",ÿ"out:1.trt#1.tim"]ÿ<ÿ`alpha'
        ÿ31.ÿ
        .ÿÿÿÿÿÿÿÿÿreturnÿscalarÿancovaÿ=ÿ`ancova'
        ÿ32.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿsacsÿ=ÿ`sacs'
        ÿ33.ÿ
        .ÿÿÿÿÿÿÿÿÿreturnÿscalarÿnÿ=ÿ`N'
        ÿ34.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿsd0ÿ=ÿ`sd0'
        ÿ35.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿsddÿ=ÿ`sdd'
        ÿ36.ÿend

        .ÿ
        .ÿprogramÿdefineÿsumem
        ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
        ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntax
        ÿÿ3.ÿ
        .ÿÿÿÿÿÿÿÿÿformatÿancovaÿsacsÿrmanovaÿ%05.3f
        ÿÿ4.ÿÿÿÿÿÿÿÿÿformatÿsd?ÿ%3.1f
        ÿÿ5.ÿÿÿÿÿÿÿÿÿsummarizeÿ,ÿformat
        ÿÿ6.ÿend

        .ÿ
        .ÿ//ÿH0:
        .ÿquietlyÿsimulateÿancovaÿ=ÿr(ancova)ÿsacsÿ=ÿr(sacs)ÿrmanovaÿ=ÿr(rmanova)ÿ///
        >ÿÿÿÿÿÿÿÿÿsd0ÿ=ÿr(sd0)ÿsddÿ=ÿr(sdd),ÿreps(1000):ÿsimemÿ,ÿn(13)ÿd(0)ÿnobalance

        .ÿsumem

        ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
        -------------+---------------------------------------------------------
        ÿÿÿÿÿÿancovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.066ÿÿÿÿÿÿÿ0.248ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
        ÿÿÿÿÿÿÿÿsacsÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.065ÿÿÿÿÿÿÿ0.247ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
        ÿÿÿÿÿrmanovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.060ÿÿÿÿÿÿÿ0.238ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
        ÿÿÿÿÿÿÿÿÿsd0ÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ7.2ÿÿÿÿÿÿÿÿÿ1.4ÿÿÿÿÿÿÿÿ3.3ÿÿÿÿÿÿÿ11.7
        ÿÿÿÿÿÿÿÿÿsddÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ3.0ÿÿÿÿÿÿÿÿÿ0.6ÿÿÿÿÿÿÿÿ1.1ÿÿÿÿÿÿÿÿ5.1

        .ÿ
        .ÿ//ÿHA:
        .ÿquietlyÿsimulateÿancovaÿ=ÿr(ancova)ÿsacsÿ=ÿr(sacs)ÿrmanovaÿ=ÿr(rmanova)ÿ///
        >ÿÿÿÿÿÿÿÿÿsd0ÿ=ÿr(sd0)ÿsddÿ=ÿr(sdd),ÿreps(1000):ÿsimemÿ,ÿn(13)ÿnobalance

        .ÿsumem

        ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
        -------------+---------------------------------------------------------
        ÿÿÿÿÿÿancovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.894ÿÿÿÿÿÿÿ0.308ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
        ÿÿÿÿÿÿÿÿsacsÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.864ÿÿÿÿÿÿÿ0.343ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
        ÿÿÿÿÿrmanovaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.890ÿÿÿÿÿÿÿ0.313ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
        ÿÿÿÿÿÿÿÿÿsd0ÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ7.2ÿÿÿÿÿÿÿÿÿ1.5ÿÿÿÿÿÿÿÿ3.0ÿÿÿÿÿÿÿ12.4
        ÿÿÿÿÿÿÿÿÿsddÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿÿÿ3.0ÿÿÿÿÿÿÿÿÿ0.6ÿÿÿÿÿÿÿÿ1.4ÿÿÿÿÿÿÿÿ5.3

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .


        Here, it looks as if overall the repeated-measures ANOVA-like setup has a slight edge over the other two. Nevertheless, if it were me, I'd probably still not favor it, because it assumed compound symmetry for the errors of before-and-after body weights, and I'm pretty certain that the variance of the posttreatment body weights will be affected (increased) by the dietary interventions.

        Comment


        • #5
          Thanks Joseph Coveney , you do file is very sophisticated! I

          Is SU defining the standard deviation for baseline scores and what does the "pid_u" code. I am also unsure how you make sure that the sdd is 3.

          You wrote

          Here, it looks as if overall the repeated-measures ANOVA-like setup has a slight edge over the other two.
          Based on the HA tabulation, I would have said the ANCOVA approach had an edge; what am I missing?

          Last edited by Laura Myles; 27 Jun 2022, 08:33.

          Comment


          • #6
            Originally posted by Laura Myles View Post
            Is SU defining the standard deviation for baseline scores
            No, it's the standard deviation of the participant random effect.

            and what does the "pid_u" code.
            It is the random effect of participant. A mixed model needs this.

            I am also unsure how you make sure that the sdd is 3.
            It's a function of both SU (standard deviation of the participant random effect) and SE (standard deviation of the errors).

            Based on the HA tabulation, I would have said the ANCOVA approach had an edge; what am I missing?
            The repeated-measures ANOVA-like setup holds the test size better. The apparent slightly higher power for the ANCOVA-like arrangement (89.4% versus 89.0%) is basically attributable to its relative inability to maintain test size (6.6% versus 6.0%) in these circumstances.

            Comment


            • #7
              Thanks again Joseph Coveney

              Comment

              Working...
              X