Statistical analysis plan for a randomized multicenter non-inferiority trial: random and fixed effects, risk difference and sample size

Jonas Tverring

Join Date: Nov 2018

Posts: 4
#1

Statistical analysis plan for a randomized multicenter non-inferiority trial: random and fixed effects, risk difference and sample size

17 Mar 2022, 08:31

Dear Statalist users,

we are planning a multicenter randomized non-inferiority trial investing two types of oral antibiotics for febrile urinary tract infection (UTI) with clinical cure as the primary outcome (NCT 05224401).

I am struggling with the statistical analysis plan. I would like to use a random effect/intercept for site and a fixed effect for sex (because females are considered easier to treat for febrile UTI compared to men) and I would like to have the results appear as absolute risk difference with 95% confidence interval. I know of a similar trial that I believe used GEE in R for this purpose (PMID 34756180) but I am struggling to specify the correct version of commands and code in Stata. I have tried using meglm, xtgee and binreg, but I am unable to get exactly what I want with either of these three.

Code:

meglm cure trt sex || site: , cov(un) eform family(bernoulli) link(logit)

- Doesn't give me risk difference

Code:

xtset site xtgee cure trt sex , family(binomial) link(identity) nolog eform

- Doesn't give me risk difference

Code:

binreg cure trt sex , rd

- No random effect/intercept for site

I am also hesitant as to how such an analysis should affect the sample size calculation. So far I have only conducted a simple sample size calculation for a test of non-inferiority between two proportions using the "ssi" command, but I imagine that this is not completely translatable to an analysis including both random and fixed effects (and I fear that the only way out is to simulate, which is probably beyond my competence).

Code:

ssi 0.93 0.10, alpha(0.025) power(0.9) noninferiority

- Doesn't take into account random and fixed effects in primary analysis?

I am very grateful for any advice!

Best regards,

Jonas Tverring
M.D, Ph.D., Specialist in Infectious Diseases
Helsingborg hospital
Lund University
Sweden
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4412
#2

17 Mar 2022, 09:04

much of what you say is too brief for a full understanding (by me at any rate), but -meglm- does not allow the identity link with the binomial family so you need to use the default family and link (this will be similar to using -regress- for non-nested data and will give the risk difference); not sure what is going on with your -xtgee- command but you don't show any results so it's hard to diagnose

added in edit - the use of -regress- to get risk differences is sometimes called the "linear probability model"

Last edited by Rich Goldstein; 17 Mar 2022, 09:06.
Comment

Jonas Tverring

Join Date: Nov 2018
Posts: 4

18 Mar 2022, 03:20

Thank you Rich Golstein for your reply, I will be sure to include output next time to make things clearer.

I got a reply from the statistician of the (PMID 34756180) study I mentioned. He recommended using family(poisson) and link(identity) in GEE for absolute risk difference. This fits well with a methodological article I found (PMID: 27576307).

I guess one of the problems I have had is understanding what the coefficient means depending on family and link, which is not so clearly presented in the Stata manuals. They say that the exponentiated (eform) coefficient from poisson family and identify link is incidence rate ratio.

Anyhow I think that I may have achieved what I want below.

Code:

*** Random effect for site
xtset site

Panel variable: site (unbalanced)

xtgee cure trt sex , family(poisson) link(identity) nolog

GEE population-averaged model                        Number of obs    =    652
Group variable: site                                 Number of groups =     17
Family: Poisson                                      Obs per group:  
Link:   Identity                                                  min =      4
Correlation: exchangeable                                         avg =   38.4
                                                                  max =     76
                                                     Wald chi2(2)     =   1.23
Scale parameter = 1                                  Prob > chi2      = 0.5414

------------------------------------------------------------------------------
        cure | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         trt |  -.0290371   .0379422    -0.77   0.444    -.1034025    .0453282
         sex |   .0313277   .0395137     0.79   0.428    -.0461178    .1087732
       _cons |   .2581078   .0574719     4.49   0.000      .145465    .3707506
------------------------------------------------------------------------------

*** Looking for a visual display of the findings

margins, dydx(trt)

Average marginal effects                                   Number of obs = 652
Model VCE: Conventional

Expression: Linear prediction, predict()
dy/dx wrt:  trt

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         trt |  -.0290371   .0379422    -0.77   0.444    -.1034025    .0453282
------------------------------------------------------------------------------

marginsplot , horizontal

Variables that uniquely identify margins:

Does it look reasonable to you? (I am using example data from another study)

Any advice on how to calculate sample size for the above intended primary analysis? The expected cure rate is 93% and I am looking at a 10% absolute non-inferiority margin.

Best regards,

Jonas

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4354
#4

19 Mar 2022, 07:01

Originally posted by Jonas Tverring View Post

I would like to use a random effect/intercept for site . . . and I would like to have the results appear as absolute risk difference with 95% confidence interval. I know of a similar trial that I believe used GEE . . . but I am struggling to specify the correct version of commands and code in Stata. . . .

Code:

xtset site xtgee cure trt sex , family(binomial) link(identity) nolog eform

- Doesn't give me risk difference

Omit the eform option and it will give you the risk difference (and 95% CI) directly. That is, try this:

Code:

xtgee cure i.(trt sex), i(site) family(binomial) link(identity) nolog

(For clarity and explicitness, I've taken the liberty of including the clustering variable in the estimation command as an option.)

So far I have only conducted a simple sample size calculation for a test of non-inferiority between two proportions using the "ssi" command, but I imagine that this is not completely translatable to an analysis including both random and fixed effects (and I fear that the only way out is to simulate, which is probably beyond my competence).

Code:

ssi 0.93 0.10, alpha(0.025) power(0.9) noninferiority

- Doesn't take into account random and fixed effects in primary analysis?

I am very grateful for any advice!

With response rates that high, the intraclass correlation won't likely be huge; an ICC coefficient of 0.10 would be about as high as you'd expect to see. (The reference you give in your later post says as much.) With that low an ICC, you're pretty close to independent observations, and so the sample size that you got with the user-written ssi command would be in the ballpark.

Originally posted by Jonas Tverring View Post

Code:

xtset site xtgee cure trt sex , family(poisson) link(identity) nolog

Does it look reasonable to you?

I'd stick with the risk-difference GEE above: it models your supposed data-generating process more closely and so ought to be the best option if it converges; nevertheless, if you're going to go with this command, be sure to at least adjust the standard errors of the regression coefficients . . . something like this:

Code:

xtgee cure i.(trt sex), i(site) family(poisson) link(identity) vce(robust)

. . . you could also remove the default exchangeable working correlation structure, which would help convergence and is probably unnecessary with the cluster-robust standard errors.

Any advice on how to calculate sample size for the above intended primary analysis? The expected cure rate is 93% and I am looking at a 10% absolute non-inferiority margin.

See below. I use simulation in order to assess both test size (nominally an alpha of 2.5%, one-sided) and power at a total patient sample of 300 enrolled evenly among 30 clinics with an ICC of 0.1. Your study's entry in the U.S.'s registry site says that you'll be enrolling 330 patients. With a, say, 10% rate of unevaluable cases, you should have nearly the 90% power that you seek, regardless of the statistical modeling method that you use.

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿ//ÿseedem
.ÿsetÿseedÿ1280444040

.ÿ
.ÿprogramÿdefineÿsimem,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[Clinics(intÿ30)ÿn(intÿ300)ÿIcc(realÿ0.1)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿReference(realÿ0.93)ÿTest(realÿ0.93)ÿDelta(realÿ0.1)]
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿdropÿ_all
ÿÿ4.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿClinics
.ÿÿÿÿÿÿÿÿÿsetÿobsÿ`clinics'
ÿÿ5.ÿÿÿÿÿÿÿÿÿgenerateÿintÿcidÿ=ÿ_n
ÿÿ6.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿcid_uÿ=ÿrnormal(0,ÿsqrt(_pi^2ÿ*ÿ`icc'ÿ/ÿ(3ÿ-ÿ3ÿ*ÿ`icc')))
ÿÿ7.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿPatients
.ÿÿÿÿÿÿÿÿÿexpandÿ`=ceil(`n'ÿ/ÿ_N)'
ÿÿ8.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿTreatments
.ÿÿÿÿÿÿÿÿÿsortÿcid
ÿÿ9.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿmod(_n,ÿ2)
ÿ10.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿOutcome
.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿxbuÿ=ÿ!trtÿ*ÿlogit(`reference')ÿ+ÿtrtÿ*ÿlogit(`test')ÿ+ÿcid_u
ÿ11.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿoutÿ=ÿrbinomial(1,ÿinvlogit(xbu))
ÿ12.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿTestÿmethods
.ÿÿÿÿÿÿÿÿÿ*ÿRiskÿdifferenceÿGEEÿwithÿdefaultÿworkingÿcorrelation
.ÿÿÿÿÿÿÿÿÿtempnameÿexc
ÿ13.ÿÿÿÿÿÿÿÿÿcaptureÿxtgeeÿoutÿi.trt,ÿi(cid)ÿfamily(binomial)ÿlink(identity)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcorr(exchangeable)ÿiterate(50)
ÿ14.ÿÿÿÿÿÿÿÿÿifÿ!_rcÿscalarÿdefineÿ`exc'ÿ=ÿr(table)["ll",ÿ"1.trt"]ÿ>ÿ-`delta'
ÿ15.ÿÿÿÿÿÿÿÿÿelseÿscalarÿdefineÿ`exc'ÿ=ÿ.c
ÿ16.ÿ
.ÿÿÿÿÿÿÿÿÿ*ÿPoissonÿwithÿidentityÿlinkÿfunctionÿwithÿindependentÿworkingÿcorrelation
.ÿÿÿÿÿÿÿÿÿtempnameÿpoi
ÿ17.ÿÿÿÿÿÿÿÿÿcaptureÿxtgeeÿoutÿi.trt,ÿi(cid)ÿfamily(poisson)ÿlink(identity)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcorr(independent)ÿvce(robust)ÿiterate(50)
ÿ18.ÿÿÿÿÿÿÿÿÿifÿ!_rcÿscalarÿdefineÿ`poi'ÿ=ÿr(table)["ll",ÿ"1.trt"]ÿ>ÿ-`delta'
ÿ19.ÿÿÿÿÿÿÿÿÿelseÿscalarÿdefineÿ`poi'ÿ=ÿ.c
ÿ20.ÿ
.ÿÿÿÿÿÿÿÿÿ*ÿLinearÿprobabilityÿmodel
.ÿÿÿÿÿÿÿÿÿregressÿoutÿi.trt,ÿvce(clusterÿcid)
ÿ21.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿlpmÿ=ÿr(table)["ll",ÿ"1.trt"]ÿ>ÿ-`delta'
ÿ22.ÿ
.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿexcÿ=ÿ`exc'
ÿ23.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿpoiÿ=ÿ`poi'
ÿ24.ÿ
.ÿend

.ÿ
.ÿ//ÿTestÿsize
.ÿquietlyÿsimulateÿexcÿ=ÿr(exc)ÿpoiÿ=ÿr(poi)ÿlpmÿ=ÿr(lpm),ÿreps(1000):ÿsimemÿ,ÿt(0.83)

.ÿformatÿ_allÿ%05.3f

.ÿsummarizeÿ,ÿformat

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿÿÿÿÿexcÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.024ÿÿÿÿÿÿÿ0.153ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿÿpoiÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.020ÿÿÿÿÿÿÿ0.140ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿÿlpmÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.018ÿÿÿÿÿÿÿ0.133ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿ//ÿRelativeÿpower
.ÿquietlyÿsimulateÿexcÿ=ÿr(exc)ÿpoiÿ=ÿr(poi)ÿlpmÿ=ÿr(lpm),ÿreps(1000):ÿsimem

.ÿformatÿ_allÿ%05.3f

.ÿsummarizeÿ,ÿformat

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿÿÿÿÿexcÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.888ÿÿÿÿÿÿÿ0.316ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿÿpoiÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.883ÿÿÿÿÿÿÿ0.322ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿÿlpmÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.868ÿÿÿÿÿÿÿ0.339ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿexit

endÿofÿdo-file

.

A couple of notes.

First, three methods are evaluated in the simulation whose output is shown above: the conventional risk-difference GEE with a default exchangeable working correlation structure (exc), the Poisson model that your statistician suggested (poi) and the linear probability model (lpm) that Rich mentioned earlier. For the conventionial risk-difference GEE, I expect that the exchangeable working correlation structure reasonably reflects the nature of the correlation among outcomes of patients enrolled over time for a study less than a decade long, and so I don't recommend removing that and relying on clustered (robust) standard errors as a fix-up.

Second, suitability of robust standard errors—which are used in both the Poisson regression model and the linear probability model—depends upon modeling the mean response correctly. If you have an interaction of treatment condition and patient's sex, then you will want to include a treatment-by-sex interaction term in the regression model in order to help assure that the mean responses are accurately modeled.

Third, these models kinda also assume that the random effect of clinic and fixed effects of treatment and sex are orthogonal. If some of your clinics are, for example, women's clinics, then this assumption might need revisiting, especially if as you say women are easier to treat for the disease than men.

Fourth, likewise, none of the regression models mentioned so far anticipate a clinic-by-treatment interaction. With reasonable diligence in vetting the clinic and investigator, reasonable monitoring of adherence to the protocol (including satifsfying patient eligibility criteria—microbial susceptibility testing and so on), and mature standard of care (both of the drug regimens are established, I believe, at least for treatment of other infectious diseases and so the clinics' staffs shouldn't vary much in their relative familiarity with both or in their relative facility in managing them both), this is not likely to be a concern.

You'll see in the results of the simulation above, that the conventional risk-difference GEE at 2.4% has the closest test size to the nominal 2.5%. The other two are a bit conservative at about 2% each. The conventional risk-difference GEE also is the most promising for power, although all are in the neighborhood of 90% So, for your statistical analysis plan (SAP), you might want to consider using the conventional risk-difference GEE that you had originally intended in your first post above (no eform option, though), and include either of the other two as fallback options (specified explicitly in the SAP) in case the first-choice regression option experiences convergence difficulties.
4 likes
Comment
Jonas Tverring

Join Date: Nov 2018

Posts: 4
#5

23 Mar 2022, 15:38

Joseph Coveney. I am, quite frankly, blown away by the insight, detail and effort you put in to answering my question. Thank you! You make me want to use statalist more often and you lessen my desire to switch to R.

I will happily take your advice to use the conventional risk difference GEE (family(binomial), link(identity), corr(exchange)) as my primary analysis in the SAP and have possion and lpm as pre-specified alternatives.

You are probably also right that we should include an interaction term between treatment and sex, because not only are females considered easier to treat in general, but many would argue that our intervention (i.e., betalactam antibiotics) has a greater prior probability to be non-inferiority in women than in men. I don' consider there to be a reason to include a site-to-treatment interaction though and the distribution between sexes should be fairly equal between sites.

This should do it

Code:

xtset site xtgee cure i.trt##i.sex, i(site) family(binomial) link(identity) nolog corr(exchange)

I would also like to visually my findings graphically and perhaps the following code will suffice to show the average result across both sexes:

Code:

margins, dydx(trt) marginsplot , horizontal

and this one to visualize the effect stratified by sex:

Code:

margins, dydx(trt) over(sex) marginsplot , horizontal recast(scatter)

Thank you also for the power simulation. I now feel confident that our intended sample size (n=330) appear reasonably well-powered.
I tried to copy your code into my do-file editor but unfortunately received the error code below when trying to run the code, any idea what I might be doing wrong?

__000000 not found
an error occurred when simulate executed simem
r(111);

I really appreciate it!

Jonas
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4354
#6

23 Mar 2022, 19:56

Originally posted by Jonas Tverring View Post

I tried to copy your code into my do-file editor but unfortunately received the error code below when trying to run the code, any idea what I might be doing wrong?

I don't know, but here's the original do-file attached below. Try that and let me know if you're still receiving the same error message.
Attached Files

Tverring20220319.do (1.7 KB, 1 view)
1 like
Comment
Jonas Tverring

Join Date: Nov 2018

Posts: 4
#7

28 Mar 2022, 15:43

Works. Like. A. Charm.
Thanks again Joseph!
Comment

Announcement