Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confidence interval and p-value for relative risk using - margins - after logistic regression model

    I try to estimate relative risks (RRs) using - margins - after fitting a logistic model. Since log(RR) would is more normally distributed than RR, I am thinking of estimating log (RR) instead of RR, then exponentiate the point estimate and confidence interval. I see examples in the Stata pdf documentation and other posts that calculate RR directly, but I do not feel comfortable with the confidence interval symmetric around the point estimate. Also, I plan to obtain a p-value based on the test for log(RR) = 0. However, I wonder if estimating log(RR) would be a valid approach when estimating RRs based on predicted probabilities.

    Code:
    . use https://www.stata-press.com/data/r16/bangladesh, clear
    . logit c_use i.urban age i.child3, or nolog
    . margins child3, post
    
    Predictive margins                              Number of obs     =      1,934
    Model VCE    : OIM
    
    Expression   : Pr(c_use), predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          child3 |
              0  |   .3677632   .0153692    23.93   0.000     .3376401    .3978862
              1  |    .432382   .0214472    20.16   0.000     .3903462    .4744177
    ------------------------------------------------------------------------------
    
    . nlcom (RR: _b[1.child3] / _b[0.child3])
    
              RR:  _b[1.child3] / _b[0.child3]
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              RR |   1.175708   .0849553    13.84   0.000     1.009198    1.342217
    ------------------------------------------------------------------------------
    
    . nlcom (log_RR: log(_b[1.child3] / _b[0.child3]))
    
          log_RR:  log(_b[1.child3] / _b[0.child3])
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          log_RR |   .1618703   .0722589     2.24   0.025     .0202455    .3034951
    ------------------------------------------------------------------------------
    
    . local b  = exp(r(b)[1,1])
    . local lb = exp(r(b)[1,1] - 1.96*sqrt(r(V)[1,1]))
    . local ub = exp(r(b)[1,1] + 1.96*sqrt(r(V)[1,1]))
    
    . display %8.6f  `b' " (" `lb' ", " `ub' ")"
    1.175708 (1.0204492, 1.3545885)

  • #2
    Yes, the appoach is valid, and you are correct to estimate the log(RR) as you will have better estimates of the confidence interval. You might be interested in reading this article from the Stata Journal which compares a number of methods to estimate a risk ratio. The approach using predictions from a logstic regression model is discussed in Method 6. In particular, the way that you have used nlcom is equivalent to the more verbose use of -predictnl- (see below). I would consider estimating the use of the modified Poisson regression approach (Method 5) which should give you narrower confidence intervals.

    Code:
    clear *
    cls
    
    input byte(receptor stage died total)
    0 1 2 12
    1 1 5 55
    0 2 9 22
    1 2 17 74
    0 3 12 14
    1 3 9 15
    end
    
    expand total
    drop total
    bys receptor stage : gen byte y = cond(_n <= died, 1, 0)
    drop died
    gen byte low = receptor==0
    
    glm y i.low i.stage , eform link(log) fam(poisson) vce(rob) nolog
    scalar rr_p = r(table)["b", "y:1.low"]
    scalar rr_p_ll = r(table)["ll", "y:1.low"]
    scalar rr_p_ul = r(table)["ul", "y:1.low"]
    display "Risk ratio from modified Poisson model = " rr_p " 95% CI = " rr_p_ll ", " rr_p_ul
    
    logit y i.low i.stage, or nolog
    predictnl lnrr = ln( sum(invlogit(_b[_cons] + _b[2.stage]*(stage==2) + _b[3.stage]*(stage==3) + _b[1.low])) ///
                         / sum(invlogit(_b[_cons] + _b[2.stage]*(stage==2) + _b[3.stage]*(stage==3))) ), se(se_lnrr)
    scalar rr_l = exp(lnrr[_N])
    scalar rr_l_ll = exp(lnrr[_N] - invnormal(0.975)*se_lnrr[_N])
    scalar rr_l_ul = exp(lnrr[_N] + invnormal(0.975)*se_lnrr[_N])
    display "Risk ratio from logit with predictnl = " rr_l " 95% CI = " rr_l_ll ", " rr_l_ul
    
    margins i.low, post
    nlcom (lnRR: ln(_b[1.low] / _b[0.low])), post
    scalar rr_lm = exp(_b[lnRR])
    scalar rr_lm_ll = exp(_b[lnRR] - invnormal(0.975)*_se[lnRR])
    scalar rr_lm_ul = exp(_b[lnRR] + invnormal(0.975)*_se[lnRR])
    display "Risk ratio from logit with margins = " rr_lm " 95% CI = " rr_lm_ll ", " rr_lm_ul
    Res.:

    Code:
    Risk ratio from modified Poisson model = 1.6307752 95% CI = 1.0733053, 2.4777925
    Risk ratio from logit with predictnl = 1.6755988 95% CI = 1.0935712, 2.567397
    Risk ratio from logit with margins = 1.6755988 95% CI = 1.0935712, 2.567397

    Comment


    • #3
      Thank you, Leonardo, for all the info and your quick reply! (and sorry for my late reply...) They are all helpful!

      I saw some articles about Poisson with robust variance as well. Eventually, I did not use it since the predicted probabilities exceeded one with my data and I was not comfortable having marginal effects over them.

      Comment


      • #4
        I realized that average probabilities estimated by margins come with the variance/SE estimation using the delta method. This means that the estimated probabilities can have confidence intervals beyond the [0, 1] boundary.

        I thought estimating logits of probability, instead of estimating probabilities directly, and use the logits of probability to estimate RR further. However, the mean of predicted probabilities is not equal to the inverse logit of the mean of the logit of the probability.

        Does anyone have an idea to overcome these issues?
        1. To estimate probabilities with confidence intervals within [0, 1]
        2. To estimate risk ratios with "better" confidence intervals


        Code:
        . use https://www.stata-press.com/data/r16/bangladesh, clear
         
        . ** Estimate marginal propabilities directly *****
        . quietly logit c_use i.urban age i.child3, or nolog
         
        . *  Marginal probabilities
        . margins child3, post
        
        Predictive margins                              Number of obs     =      1,934
        Model VCE    : OIM
        
        Expression   : Pr(c_use), predict()
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              child3 |
                  0  |   .3677632   .0153692    23.93   0.000     .3376401    .3978862
                  1  |    .432382   .0214472    20.16   0.000     .3903462    .4744177
        ------------------------------------------------------------------------------
        
        . quietly nlcom (log_RR: log(_b[1.child3] / _b[0.child3]))
        . local b  = exp(r(b)[1,1])
        . local lb = exp(r(b)[1,1] - 1.96*sqrt(r(V)[1,1]))
        . local ub = exp(r(b)[1,1] + 1.96*sqrt(r(V)[1,1]))
        
        . *  Risk ratio
        . display %8.6f  `b' " (" `lb' ", " `ub' ")"
        1.175708 (1.0204492, 1.3545885)
        
         
        . ** Estimate marginal logit(p) and get probabilities *****
        . quietly logit c_use i.urban age i.child3, or nolog
        
        . *  Marginal probabilities
        . quietly margins child3, expression(logit(predict())) post
        . local b1  = invlogit(r(b)[1,1])
        . local lb1 = invlogit(r(b)[1,1] - 1.96*sqrt(r(V)[1,1]))
        . local ub1 = invlogit(r(b)[1,1] + 1.96*sqrt(r(V)[1,1]))
        . local b2  = invlogit(r(b)[1,2])
        . local lb2 = invlogit(r(b)[1,2] - 1.96*sqrt(r(V)[2,2]))
        . local ub2 = invlogit(r(b)[1,2] + 1.96*sqrt(r(V)[2,2]))
        
        . display in y %8.6f "chile3 0:  " `b1' " (" `lb1' ", " `ub1' ")" ///
        >  _newline(1) %8.6f "chile3 1:  " `b2' " (" `lb2' ", " `ub2' ")"
        chile3 0:  .36487006 (.33444591, .39641342)
        chile3 1:  .43119782 (.38868202, .47475254)
        
        . *  Risk ratio
        . nlcom (RR: invlogit(_b[1.child3]) / invlogit(_b[0.child3]))
        
                  RR:  invlogit(_b[1.child3]) / invlogit(_b[0.child3])
        
        ------------------------------------------------------------------------------
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                  RR |   1.181785   .0882797    13.39   0.000      1.00876     1.35481
        ------------------------------------------------------------------------------

        Comment

        Working...
        X