Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Power calculation for equivalence testing (TOST)

    I would like to test whether associate clinicians (md==0) are non-inferior to physicians (md==1) in the occurrence of a surgical complication (iat). To complete two one-sided tests I propose to use the Stata tostt package (mean equivalence t tests). I am writing with a question about the power calculation and the interpretation.

    First, I want to calculate the potential difference that my sample size is powered to assess.

    Here are my variables:
    summarize iat if md==0

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    iatrogenic | 1,119 .2457551 .4307265 0 1

    . summarize iat if md==1

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    iatrogenic | 171 .2923977 .4561999 0 1



    By trial and error I find that I would have sufficient power to detect a difference of 13.5% between the means, given that the following code calculates a minimum sample size of 171, which is the number in my smaller group:
    Code:
    sampsi 0 0.135, sd1(.43) sd2(.46) power(.8)
    Is this correct?

    I plug in this value:
    Code:
    tostt iat, by(md) eqvtype(delta) eqvlevel(0.135) relevance
    I see output for two-sample t-test with equal variances and two-sample unpaired t-test for mean equivalence with equal variances. Is this the same as the original plan for two one-sided tests? If not, what command would you recommend instead?

    Ho: |θ| >= Δ:

    t1 = 5.095 t2 = 2.479

    Ho1: Δ-θ <= 0 Ho2: θ+Δ <= 0
    Ha1: Δ-θ > 0 Ha2: θ+Δ > 0
    Pr(T > t1) = 0.0000 Pr(T > t2) = 0.0067


    Relevance test conclusion for α = 0.05, and Δ = 0.135:
    Ho test for difference: Fail to reject
    Ho test for equivalence: Reject

    Conclusion from combined tests: Equivalence


    Would it be fair to report that I sought to assess an equivalence margin of 13.5% given that this was the difference in means that my sample was powered to detect? I will appreciate your confirmation about this approach and interpretation, or else your recommendations about alternatives. Thank you.

  • #2
    Hi, Carrie.

    I believe using a clinical/biological judgment for determining the equivalence margin would be more optimal. Justifying the choice of an equivalence margin based on statistical power from a superiority test (which uses the same sample) appears weak/very problematic.

    Hope this helps.

    Tiago.

    Comment


    • #3
      Thank you, Tiago. Does this mean that you would select a difference of, say, 10% but then comment that the sample size is not sufficiently powered to detect that difference?

      Comment


      • #4
        Hi, Carrie.

        There are some important problems in the calculations.

        The power calculation was based on a superiority test, whereas you have interest in examining equivalence. You should pre-specify the best equivalence margins. Once you have your equivalence margins established, you can test the sample size needed to achieve 90% power:

        Code:
        ssc install ssi
        ssi  0 0.20 , sd1(.44) sd2(.44) alpha(0.05) equivalence
        Furthermore, it seems that your outcome a binary (yes/no). Hence, the calculations above are incorrect, because they assume a continuous outcome.

        If your outcome is binary, you should type
        Code:
        help tost
        and check the best option for proportions.

        Hope this helps.

        All the best,

        Tiago

        Comment


        • #5
          Thank you for pointing out these errors.

          I notice that your code is
          Code:
           
           ssi  0 0.20 , sd1(.44) sd2(.44) alpha(0.05) equivalence
          instead of the actual standard deviations from my data
          Code:
          ssi 0 0.20 , sd1(.43) sd2(.46) alpha(0.05) equivalence
          Was this an approximation, or is it important to list equal standard deviations?

          Thanks to your guidance I used "tostpr" instead of "tostt".

          Two-sample test of proportion equivalence CO/AMO: Number of obs = 1119
          Doctor: Number of obs = 171
          ------------------------------------------------------------------------------
          Variable | Mean Std. Err. [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          CO/AMO | .2457551 .0128704 .2205296 .2709807
          Doctor | .2923977 .0347843 .2242216 .3605737
          -------------+----------------------------------------------------------------
          Δ-θ | .1816425 .0356449 .1117798 .2515052
          θ+Δ | .0883575 .0356449 .0184948 .1582202
          ------------------------------------------------------------------------------
          θ = prop(iatrogenic|md = CO/AMO) - prop(iatrogenic|md = Doctor)
          = -.04664252
          Δ = 0.1350 Δ expressed in same units as prop(iatrogenic)

          Ho: |θ| >= Δ:

          z1 = 5.096 z2 = 2.479

          Ho1: Δ-θ <= 0 Ho2: θ+Δ <= 0
          Ha1: Δ-θ > 0 Ha2: θ+Δ > 0
          Pr(Z > z1) = 0.0000 Pr(Z > z2) = 0.0066


          Relevance test conclusion for α = 0.05, and Δ = 0.135:
          Ho test for difference: Fail to reject
          Ho test for equivalence: Reject

          Conclusion from combined tests: Equivalence


          My interpretation is that CO/AMO are not inferior to doctors in the risk I'm evaluating at an equivalence margin of 13.5%. Does this align with how you would frame the results?

          Is there an alternative noninferiority test command that would consider whether there's a difference in only one direction rather than delta in either direction, I wonder? It only matters whether CO/AMO experience the studied outcome less often than doctors do. One wouldn't care about the extent of the difference in the desired direction.

          Comment


          • #6
            I do not know how to interpret P-values from equivalence tests. I simply compare the estimated difference and 90% confidence intervals to the pre-specified margins.
            In your case, you can follow these steps:

            1. Calculate the risk difference and compute the 90% confidence interval;
            2. In Stata, this can be done quickly via a simple generalized linear model.
            3. Check whether the 90% CI is entirely contained within the pre-specified margins of equivalence [-delta, +delta]
            4. If so, you have evidence of equivalence.




            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int counts byte(iat md)
            275 1 0
            844 0 0
             50 1 1
            121 0 1
            end
            Code:
             expand counts
             glm iat md, family(binomial 1) link(identity) level(90)
            Code:
            Iteration 0:   log likelihood = -727.31331  
            Iteration 1:   log likelihood = -727.31331  
            
            Generalized linear models                         Number of obs   =      1,290
            Optimization     : ML                             Residual df     =      1,288
                                                              Scale parameter =          1
            Deviance         =  1454.626614                   (1/df) Deviance =   1.129368
            Pearson          =         1290                   (1/df) Pearson  =   1.001553
            
            Variance function: V(u) = u*(1-u)                 [Bernoulli]
            Link function    : g(u) = u                       [Identity]
            
                                                              AIC             =   1.130718
            Log likelihood   = -727.3133072                   BIC             =  -7770.541
            
            ------------------------------------------------------------------------------
                         |                 OIM
                     iat |      Coef.   Std. Err.      z    P>|z|     [90% Conf. Interval]
            -------------+----------------------------------------------------------------
                      md |   .0466425    .037089     1.26   0.209    -.0143635    .1076486
                   _cons |   .2457551   .0128704    19.09   0.000     .2245852    .2669251
            ------------------------------------------------------------------------------
            The risk difference (90% CI) was 0.0467 (-0.014, 0.108), which is entirely contained within the pre-specified equivalence margins [-0.135,0.135], supporting the clinical equivalence between the two categories of physicians.

            You can check this paper (Figure 1) and other classic texts:

            https://trialsjournal.biomedcentral..../1468-6708-5-8

            Hope this helps.

            All the best,

            Tiago
            Last edited by Tiago Pereira; 29 Jul 2023, 12:26.

            Comment


            • #7
              Thank you, Tiago. I appreciate your guidance. The difference that I spot with your example is that my "iat" variable is dichotomous. Besides using logistic regression rather than a generalized linear model, is there anything else that would change in your recommended steps?

              Comment


              • #8
                I assumed that you have a retrospective or prospective cohort design and the outcome is binary ("occurrence of a surgical complication, iat - yes/no"). In this scenario, the suggested GLM is correct. The equivalence test can be based on risk differences, which can be elicited more easily than odds ratios. If you use -ssi-, make it sure you estimate the sample size for binary outcomes.
                Last edited by Tiago Pereira; 03 Aug 2023, 16:46.

                Comment


                • #9
                  Thank you. You are correct that it is a retrospective cohort design with a binary outcome. I appreciate your confirmation.

                  Comment

                  Working...
                  X