Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression and categorical variable interactions

    In a randomised trial, a drug had no effect on mortality in low-birth-weight babies. However, if the drug *had* been effective, how would I use Stata 15 (Windows) to test whether the effect of the drug in babies weighing <1000 g (baseline) differed from (1) the effect of the drug in babies weighing 1000-1499 g and (2) from the effect of the drug in babies weighing 1500-1999 g?

    The Stata file has variables:
    died: 0 (survived), 1 (died)
    drug: 0 (control), 1 (treatment)
    wtgp: 0 (0-999g), 1 (1000-1499g), 2 (1500-1999g).

    Code:
    . cs died drug, by(wtgp)
               wtgp |       RR       [95% Conf. Interval]   M-H Weight
    -----------------+-------------------------------------------------
              0-999g |    .8643293     .7181096   1.040322     32.19632
          1000-1499g |    1.034783      .831774   1.287339         57.5
          1500-1999g |    .9549525     .6873575   1.326725     34.01655
    ------------------------------------------------------------------------------
    Using the Excel spreadsheet for ratios from http://www.statpages.info/Confidence...0P-values.xlsx
    wtgp 0 vs 1: 0.86 (0.72-1.04) vs 1.03 (0.83-1.29) gives p = 0.22.
    wtgp 0 vs 2: 0.86 (0.72-1.04) vs 0.95 (0.69-1.33) gives p = 0.60.

    Code:
    . logistic died i.drug##i.wtgp
    ------------------------------------------------------------------------------
            died | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          1.drug |   .5721154   .2068869    -1.54   0.123      .281627    1.162232
            wtgp |
              1  |   .0979067   .0287701    -7.91   0.000     .0550408    .1741566
              2  |   .0188347   .0056566   -13.23   0.000     .0104549    .0339312
       drug#wtgp |
            1 1  |   1.832185   .7197436     1.54   0.123     .8483771    3.956852
            1 2  |   1.663846   .6716229     1.26   0.207     .7542577    3.670342
           _cons |   3.764706    1.02721     4.86   0.000     2.205355    6.426634
    ------------------------------------------------------------------------------
     
    . margins wtgp, dydx(drug) pwcompare(effects)
    Pairwise comparisons of conditional marginal effects
    Model VCE    : OIM
    Expression   : Pr(died), predict()
    dy/dx w.r.t. : 1.drug
    ------------------------------------------------------------------------------
                 |   Contrast Delta-method    Unadjusted           Unadjusted
                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    0.drug       |  (base outcome)
    -------------+----------------------------------------------------------------
    1.drug       |
            wtgp |
         1 vs 0  |   .1165643    .074964     1.55   0.120    -.0303625    .2634911
         2 vs 0  |   .1042139   .0693238     1.50   0.133    -.0316582    .2400861
         2 vs 1  |  -.0123504   .0323953    -0.38   0.703     -.075844    .0511433
    ------------------------------------------------------------------------------
    P = 0.22 (from Excel) differs from p = 0.120 (margins)
    P = 0.60 (from Excel) differs from p = 0.133 (margins).

    What is the correct way to compare the effect on drug of wtgp 0 vs 1, and wtgp 0 vs 2?
    I apologise if this problem is very basic, but I’ve spent many hours trying to solve it.
    Last edited by Frank Shann; 19 Jan 2018, 18:40.

  • #2
    Well, with the full admission that I have a strong prejudice here, I do not take statistical analyses calculated in Excel seriously. Certainly in earlier versions of Excel, many of them were flat-out wrong. I am told that in more modern versions of Excel the errors have been fixed. Perhaps so. But even so, what you do in Excel leaves no audit trail, so you can never adequately explain or defend your analyses. So I tend to dismiss them out of hand. Others may beg to differ. And I realize this is more of an ad "hominem" attack on Excel than a critique of the actual analysis (the implementation of which I know nothing about.) Somebody who is familiar with this particular Excel routine might want to chime in and defend it.

    To me, the more interesting question here is whether you should be looking at the interaction coefficients in the logistic regression output, or at the contrasts of the marginal effects on outcome probability in the -margins- output. This subject comes up fairly regularly here and remains controversial. It is very far from being basic. When viewed through the distorting lens of p-values, one gets an additional layer of complexity because the arbitrary dichotomization of statistical significance at the arbitrary threshold of .05 (or any other arbitrary threshold) can make results that are substantively very similar appear to be qualitatively different, especially since most people tend to forget that the difference between statistically significant and statistically insignificant is not, itself, statistically significant.

    So let's leave issues of statistical significance out of it and just discuss whether the logistic regression output or the margins output is more salient. Logistic regression is, of course, a non-linear model. As a result, even in a model that contains no explicit interaction terms, the marginal effect of any predictor on the outcome probability will not be constant but will depend on the value of that predictor, and it will also depend on the values of other variables. In other words, even without explicit interaction terms, the logistic model creates interaction when you look at effects in the probability metric. So there are a few ways to look at it.

    If you really believe that the odds ratios are the true effects, not the probability differences, then you would rely on the logistic regression output and ignore the marginal effects on probabilities. On the other hand if you believe the outcome probabilities are the real deal and that the logistic model and odds ratios are just an indirect way to get at them, necessitated by the inconvenient properties of linear probability models, then you would rely on the marginal effects on probabilities.

    In my own work, I am inclined to think of these models as preludes to decision analysis, where the marginal effects on outcome probabilities are all-important and odds ratios don't really figure into it. So I tend to focus on those. On the other hand if you are focused on causal modeling, the case can be made that the odds ratio is actually the better representation of the treatment effect.

    Much ink has been spilled over this issue, both here on this Forum and in the literature. Notwithstanding all the sturm und drang, and strongly expressed opinions (mea culpa), I would also note that if we put aside artificial examples created to demonstrate a theoretical point, I have never seen a real-world application where you would really come to different conclusions one way or the other (unless you allow statistical significance to obfuscate the issue). In your own example, the odds ratios for 1 vs 0, 2 vs 0, and 2 vs 1 are, rounding to 2 places, 1.83, 1.66, and 1.66/1.83 = about 0.91. So you could characterize the first two interaction effects as being moderate to large positive, and the last one as being small negative. If you look at the marginal effect on predicted probability contrasts, the corresponding figures are 0.12, 0.10, and -.01. In these terms, you would also characterize the first two interactions as moderate to large positive, and the third as small negative.



    Comment


    • #3
      I think Frank's point is that in randomized trials (and in cohort studies) with dichotomous outcomes, the RR is (usually) the preferred measure of effect size. In that case, one of the approaches discussed on this UCLA page might be useful. HTH.
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Well, then, in that case, can't Frank use glm died i.drug##i.wtgp, family(binomial) link(log)?

        Comment


        • #5
          Thank you very much indeed Clyde and Bruce for your prompt and thoughtful replies. I do indeed want RR rather than odds ratios. In addition to the link Bruce gave, I also found very helpful Clyde's posts of August 2016 at www.statalist.org/forums/forum/general-stata-discussion/general/1353874-relative-risk-ratio-calculation-help

          Stata's binreg command gives me risk ratios (rather than odds ratios) and confidence intervals and p values identical to http://www.bmj.com/content/bmj/326/7382/219.full.pdf
          Do you think binreg is suitable?
          Code:
          . binreg died i.drug##i.wtgp, rr
          ------------------------------------------------------------------------------
                       |                 EIM
                  died | Risk Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                1.drug |   .8643293   .0817298    -1.54   0.123     .7181096    1.040322
                  wtgp |
                    1  |   .3408592   .0334547   -10.97   0.000     .2812103    .4131605
                    2  |   .0837999   .0109297   -19.01   0.000      .064897    .1082088
             drug#wtgp |
                  1 1  |   1.197209   .1749585     1.23   0.218     .8990347    1.594275
                  1 2  |   1.104848   .2127638     0.52   0.605     .7575027    1.611465
                 _cons |   .7901235   .0452467    -4.11   0.000     .7062372    .8839736
          ------------------------------------------------------------------------------
          I understand Clyde's concerns about Excel calculations but, just before he died, John Pezzullo (of statpages.info) wrote an Excel routine for me that calculates the ratio of two relative risks (or means - it is important to choose the correct tab) from just the ratios and 95% CI. This is invaluable when the raw data are not available (e.g. when reading a paper). It is available at statpages.info/Confidence%20Intervals%20to%20P-values.xlsx. It gives results that are identical to Stata’s binreg command for the data I posted and the BMJ article (and for five other datasets that I have tested)...
          Code:
          wtgp == 0 vs wtgp == 1  
          Group A Group B
          Ratio: 0.8643293 1.034783 (rate/odds/risk/hazard ratio, relative risk)
          Lower Conf. Limit: 0.7181096 0.831774
          Upper Conf. Limit: 1.040322 1.287339
          Confidence Level: 95.0% 95.0%
          Std Err of Ln(Ratio): 0.094558716 0.111423427
          p-value for Ho: Ratio=1 0.12309439 0.758947544 (testing whether each true ratio is 1)
          For Comparing Two Groups:
          Ratio of Ratios (B/A): 1.197209212 Ratio A/B: 0.835275898
          Std Err of Ln(B/A) 0.146138738
          Desired Conf. Level: 95.0%
          Lower 95.0% Conf. Limit: 0.899034857 Lo CI for A/B: 0.627243877
          Upper 95.0% Conf. Limit: 1.594276224 Up CI for A/B: 1.112303925
          p-value for Ho: A=B: 0.218076238 (testing whether both groups have same true ratio)
          wtgp == 0 vs wtgp==2
          Group A Group B
          Ratio: 0.8643293 0.9549525 (rate/odds/risk/hazard ratio, relative risk)
          Lower Conf. Limit: 0.7181096 0.6873575
          Upper Conf. Limit: 1.040322 1.326725
          Confidence Level: 95.0% 95.0%
          Std Err of Ln(Ratio): 0.094558716 0.167761818
          p-value for Ho: Ratio=1 0.12309439 0.783503192 (testing whether each true ratio is 1)
          For Comparing Two Groups:
          Ratio of Ratios (B/A): 1.104848002 Ratio A/B: 0.905101877
          Std Err of Ln(B/A) 0.192575643
          Desired Conf. Level: 95.0%
          Lower 95.0% Conf. Limit: 0.757498627 Lo CI for A/B: 0.620550001
          Upper 95.0% Conf. Limit: 1.61147369 Up CI for A/B: 1.320134407
          p-value for Ho: A=B: 0.604626435 (testing whether both groups have same true ratio)
          Again, my profuse thanks to both of you for your advice. StataList is a wonderful resource!

          Frank

          Comment


          • #6
            Thank you Joseph. I get slightly different results with glm
            Code:
            . glm died i.drug##i.wtgp, family(binomial) link(log)
            ------------------------------------------------------------------------------
                         |                 OIM
                    died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  1.drug |  -.1458015   .0945587    -1.54   0.123    -.3311331    .0395301
                    wtgp |
                      1  |  -1.076286   .0981481   -10.97   0.000    -1.268653    -.883919
                      2  |  -2.479323   .1304281   -19.01   0.000    -2.734958   -2.223689
               drug#wtgp |
                    1 1  |   .1799929   .1461387     1.23   0.218    -.1064337    .4664194
                    1 2  |   .0997078   .1925756     0.52   0.605    -.2777335    .4771491
                   _cons |  -.2355661   .0572654    -4.11   0.000    -.3478041    -.123328
            ------------------------------------------------------------------------------
            binreg died i.drug##i.wtgp, rr
            seems to be preferable for my analysis.

            Frank

            Comment


            • #7
              If you want to match binreg, then you might want to look into the eform and vce(eim) options.
              Code:
              glm died i.drug##i.wtgp, family(binomial) link(log) eform vce(eim)

              Comment


              • #8
                glm is fine with the eform option

                Code:
                . glm died i.drug##i.wtgp, family(binomial) link(log) eform
                ------------------------------------------------------------------------------
                             |                 OIM
                        died | Risk Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                      1.drug |   .8643293   .0817298    -1.54   0.123     .7181096    1.040322
                        wtgp |
                          1  |   .3408592   .0334547   -10.97   0.000     .2812103    .4131605
                          2  |   .0837999   .0109299   -19.01   0.000     .0648967    .1082092
                   drug#wtgp |
                        1 1  |   1.197209   .1749585     1.23   0.218     .8990346    1.594276
                        1 2  |   1.104848   .2127668     0.52   0.605     .7574987    1.611474
                       _cons |   .7901235   .0452467    -4.11   0.000     .7062372    .8839736
                ------------------------------------------------------------------------------

                Comment


                • #9
                  Thanks again Joseph. We both posted at about the same time - I'm getting so much good advice that I can't keep up.
                  Code:
                  glm died i.drug##i.wtgp, family(binomial) link(log) eform
                  gives the same result as binreg (see above)...

                  ...but adding vce(eim) causes an error
                  Code:
                  . glm died i.drug##i.wtgp, family(binomial) link(log) eform vce(eim)
                  Iteration 0:   log likelihood =  -1533.168  
                  Iteration 1:   log likelihood = -1106.9587  
                  Iteration 2:   log likelihood = -1090.0395  
                  Iteration 3:   log likelihood = -1087.0655  
                  Iteration 4:   log likelihood = -1086.9353  
                  Iteration 5:   log likelihood =  -1086.935  
                  Iteration 6:   log likelihood =  -1086.935  
                  matrix has missing values
                  r(504);

                  Comment

                  Working...
                  X