Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Model-based standardization or parametric g-formula for marginal risk difference estimation

    Due to convergence issue, I need help with model-based standardization or parametric g-formula for marginal risk difference estimation in Stata. Currently, I use binreg and qreg2. Thanks.

    My current commands:

    *number of patients with adverse events
    binreg adverse_events group sex age, rd vce(cluster site)


    *number of adverse events
    qreg2 weekly_qs_adverse_events age sex group, cluster(site)

    Best wishes
    Behnam Liaghat

  • #2
    You can get a good estimate of risk difference by using -margins- after fitting a logistic regression model. The advantage is that logistic regression is less liable to convergence problems than trying to estimate the adjusted risk difference directly with an identity link. The example below shows how comparable the two estimates are in a toy dataset that allows convergence with both approaches. (Begin at the "Begin here" comment; the top part just creates a toy dataset for illustration.)

    I cannot help you with the quantile regression problem for the number of adverse events, except perhaps to suggest some kind of count model instead; a log link might get you close enough to a median.

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿ//ÿseedem
    .ÿsetÿseedÿ348161441

    .ÿ
    .ÿquietlyÿsetÿobsÿ20

    .ÿgenerateÿbyteÿsidÿ=ÿ_n

    .ÿgenerateÿdoubleÿsid_uÿ=ÿrnormal()

    .ÿ
    .ÿgenerateÿbyteÿenrÿ=ÿruniformint(2,ÿ50)

    .ÿquietlyÿexpandÿenr

    .ÿbysortÿsid:ÿgenerateÿbyteÿtrtÿ=ÿmod(_n,ÿ2)

    .ÿ
    .ÿgenerateÿintÿpidÿ=ÿ_n

    .ÿgenerateÿbyteÿsexÿ=ÿmod(_n,ÿ2)

    .ÿgenerateÿbyteÿageÿ=ÿruniformint(18,ÿ94)

    .ÿ
    .ÿgenerateÿbyteÿaetÿ=ÿrbinomial(5,ÿ0.25)

    .ÿ
    .ÿgenerateÿbyteÿhasÿ=ÿaetÿ>ÿ0

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿ//ÿcf.
    .ÿbinregÿhasÿi.(trtÿsex)ÿc.age,ÿrdÿvce(clusterÿsid)ÿnolog

    GeneralizedÿlinearÿmodelsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿ564
    Optimizationÿÿÿÿÿ:ÿMQLÿFisherÿscoringÿÿÿÿÿÿÿÿÿÿÿÿÿResidualÿdfÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ560
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(IRLSÿEIM)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿScaleÿparameterÿ=ÿÿÿÿÿÿÿÿÿÿ1
    Devianceÿÿÿÿÿÿÿÿÿ=ÿÿ599.3511442ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1/df)ÿDevianceÿ=ÿÿÿÿ1.07027
    Pearsonÿÿÿÿÿÿÿÿÿÿ=ÿÿÿ563.997178ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1/df)ÿPearsonÿÿ=ÿÿÿ1.007138

    Varianceÿfunction:ÿV(u)ÿ=ÿu*(1-u)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ[Bernoulli]
    Linkÿfunctionÿÿÿÿ:ÿg(u)ÿ=ÿuÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ[Identity]

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿBICÿÿÿÿÿÿÿÿÿÿÿÿÿ=ÿÿ-2948.279

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Std.ÿerr.ÿadjustedÿforÿ20ÿclustersÿinÿsid)
    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿSemirobust
    ÿÿÿÿÿÿÿÿÿhasÿ|ÿRiskÿdiff.ÿÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ1.trtÿ|ÿÿÿ.0952415ÿÿÿ.0300735ÿÿÿÿÿ3.17ÿÿÿ0.002ÿÿÿÿÿ.0362986ÿÿÿÿ.1541844
    ÿÿÿÿÿÿÿ1.sexÿ|ÿÿÿ-.042069ÿÿÿ.0291216ÿÿÿÿ-1.44ÿÿÿ0.149ÿÿÿÿ-.0991463ÿÿÿÿ.0150083
    ÿÿÿÿÿÿÿÿÿageÿ|ÿÿÿ.0011187ÿÿÿÿ.000348ÿÿÿÿÿ3.21ÿÿÿ0.001ÿÿÿÿÿ.0004366ÿÿÿÿ.0018009
    ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ.679998ÿÿÿ.0241108ÿÿÿÿ28.20ÿÿÿ0.000ÿÿÿÿÿ.6327417ÿÿÿÿ.7272543
    ------------------------------------------------------------------------------

    .ÿ
    .ÿlogitÿhasÿi.(trtÿsex)ÿc.age,ÿvce(clusterÿsid)ÿnolog

    LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿ564
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(3)ÿÿ=ÿÿ57.98
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿ=ÿ0.0000
    Logÿpseudolikelihoodÿ=ÿ-299.70975ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿ=ÿ0.0157

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Std.ÿerr.ÿadjustedÿforÿ20ÿclustersÿinÿsid)
    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRobust
    ÿÿÿÿÿÿÿÿÿhasÿ|ÿCoefficientÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ1.trtÿ|ÿÿÿ.5407583ÿÿÿ.1727204ÿÿÿÿÿ3.13ÿÿÿ0.002ÿÿÿÿÿ.2022324ÿÿÿÿ.8792841
    ÿÿÿÿÿÿÿ1.sexÿ|ÿÿ-.2745189ÿÿÿ.1791843ÿÿÿÿ-1.53ÿÿÿ0.126ÿÿÿÿ-.6257136ÿÿÿÿ.0766758
    ÿÿÿÿÿÿÿÿÿageÿ|ÿÿÿ.0064951ÿÿÿ.0022488ÿÿÿÿÿ2.89ÿÿÿ0.004ÿÿÿÿÿ.0020876ÿÿÿÿ.0109027
    ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.7335321ÿÿÿ.1051836ÿÿÿÿÿ6.97ÿÿÿ0.000ÿÿÿÿÿÿ.527376ÿÿÿÿ.9396882
    ------------------------------------------------------------------------------

    .ÿmarginsÿ,ÿdydx(trt)

    AverageÿmarginalÿeffectsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ564
    ModelÿVCE:ÿRobust

    Expression:ÿPr(has),ÿpredict()
    dy/dxÿwrt:ÿÿ1.trt

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿdy/dxÿÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ1.trtÿ|ÿÿÿ.0945639ÿÿÿ.0300562ÿÿÿÿÿ3.15ÿÿÿ0.002ÿÿÿÿÿ.0356549ÿÿÿÿÿ.153473
    ------------------------------------------------------------------------------
    Note:ÿdy/dxÿforÿfactorÿlevelsÿisÿtheÿdiscreteÿchangeÿfromÿtheÿbaseÿlevel.

    .ÿ
    .ÿ//ÿMaybe
    .ÿxtgeeÿaetÿi.(trtÿsex)ÿc.age,ÿi(sid)ÿfamily(poisson)ÿlink(log)ÿ///
    >ÿÿÿÿÿÿÿÿÿcorr(independent)ÿvce(robust)ÿnolog

    GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿ564
    Groupÿvariable:ÿsidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿ=ÿÿÿÿÿÿÿ20
    Family:ÿPoissonÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:ÿÿ
    Link:ÿÿÿLogÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿ3
    Correlation:ÿindependentÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿ28.2
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿ49
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(3)ÿÿÿÿÿ=ÿÿÿÿ13.59
    Scaleÿparameterÿ=ÿ1ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿ=ÿÿÿ0.0035

    Pearsonÿchi2(564)ÿÿÿÿ=ÿÿÿ400.42ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDevianceÿÿÿÿÿÿÿÿÿ=ÿÿÿ500.92
    Dispersionÿ(Pearson)ÿ=ÿ.7099573ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDispersionÿÿÿÿÿÿÿ=ÿ.8881648

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Std.ÿerr.ÿadjustedÿforÿclusteringÿonÿsid)
    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRobust
    ÿÿÿÿÿÿÿÿÿaetÿ|ÿCoefficientÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ1.trtÿ|ÿÿÿ.1969187ÿÿÿ.0695997ÿÿÿÿÿ2.83ÿÿÿ0.005ÿÿÿÿÿ.0605058ÿÿÿÿ.3333316
    ÿÿÿÿÿÿÿ1.sexÿ|ÿÿ-.1618032ÿÿÿ.0686761ÿÿÿÿ-2.36ÿÿÿ0.018ÿÿÿÿ-.2964059ÿÿÿ-.0272006
    ÿÿÿÿÿÿÿÿÿageÿ|ÿÿÿ.0018929ÿÿÿ.0012654ÿÿÿÿÿ1.50ÿÿÿ0.135ÿÿÿÿ-.0005874ÿÿÿÿ.0043731
    ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ.108005ÿÿÿ.0804633ÿÿÿÿÿ1.34ÿÿÿ0.180ÿÿÿÿ-.0497002ÿÿÿÿ.2657102
    ------------------------------------------------------------------------------

    .ÿmarginsÿ,ÿdydx(trt)

    AverageÿmarginalÿeffectsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ564
    ModelÿVCE:ÿRobust

    Expression:ÿExponentiatedÿlinearÿprediction,ÿpredict()
    dy/dxÿwrt:ÿÿ1.trt

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿdy/dxÿÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ1.trtÿ|ÿÿÿ.2495105ÿÿÿ.0877218ÿÿÿÿÿ2.84ÿÿÿ0.004ÿÿÿÿÿÿ.077579ÿÿÿÿ.4214421
    ------------------------------------------------------------------------------
    Note:ÿdy/dxÿforÿfactorÿlevelsÿisÿtheÿdiscreteÿchangeÿfromÿtheÿbaseÿlevel.

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    Another advantage of a count regression model is that you can model the counts directly by including the elapsed time as an offset, instead of having to work with something derived, e.g., "weekly_qs_adverse_events".

    Comment


    • #3
      Hello Joseph,
      I appreciate your comprehensive answer. It seems to work well with my data set. Can you please specify how you would write up these two methods?
      Furthermore, sometimes i get the following output: "1.sex omitted and 18 obs not used." Why are x observations not used?

      Best wishes
      Behnam Liaghat
      Last edited by Behnam Liaghat; 20 Feb 2022, 04:34.

      Comment


      • #4
        The use of logistic regression to estimate adjusted risk differences has been floating around the medical literature for awhile. I ran across it a few years ago in a clinical study sponsored by a medical products firm. I don't have the literature cited there at hand now, but you can Google for references using "adjusted risk difference" and "proc logistic" (SAS was used in the clinical study and is common in the medical products industry) as search terms. The use of count regression models for adverse event data is fairly common. You can explore other likelihood functions, for example, negative binomial, which might be more realistic than Poisson.

        The omission of one category of sex and its associated observations, I'm guessing, happened when you tried to fit a logistic regression model to relatively sparse data. The phenomenon is called separation or quasiseparation. The warning message probably also included a statement that observations were completely determined or something to that effect. There are a couple of approaches, including not including sex as a predictor and usage of penalized likelihood logistic regression (there are couple of user-written commands that do that, including -firthlogit- and -penlogit-; both are available from SSC or StataCorp's website).

        Comment


        • #5
          Hello again, Joseph.

          When using

          xtgee weekly_qs_adverse_events c.age i.sex i.group, i(site) family (poisson) link(log)
          margins, dydx(group)

          what is the output exactly? My intention is to report the median difference between group A and B on the count data.

          Best wishes
          Behnam Liaghat

          Comment


          • #6
            Originally posted by Behnam Liaghat View Post
            what is the output exactly?
            The difference between groups in the marginal means (predicted mu).

            My intention is to report the median difference between group A and B on the count data.
            If you mean reporting the difference in the groups' medians, then you might be be better off with individual predictions (adjusted for age and sex), finding the medians of each group, and subtracting. (Added in edit: the panel quantile regression that you show at first might be the best approach for a difference in predicted median counts per se, but as I mentioned, I cannot speak to that. I was focused above on a risk difference, i.e., proportions of patients with one or more AEs.)
            Last edited by Joseph Coveney; 30 Mar 2022, 23:57.

            Comment

            Working...
            X