Confidence interval and p-value for relative risk using - margins - after logistic regression model

Goro Yamada

Join Date: Apr 2017
Posts: 6

Confidence interval and p-value for relative risk using - margins - after logistic regression model

03 Apr 2021, 16:05

I try to estimate relative risks (RRs) using - margins - after fitting a logistic model. Since log(RR) would is more normally distributed than RR, I am thinking of estimating log (RR) instead of RR, then exponentiate the point estimate and confidence interval. I see examples in the Stata pdf documentation and other posts that calculate RR directly, but I do not feel comfortable with the confidence interval symmetric around the point estimate. Also, I plan to obtain a p-value based on the test for log(RR) = 0. However, I wonder if estimating log(RR) would be a valid approach when estimating RRs based on predicted probabilities.

Code:

. use https://www.stata-press.com/data/r16/bangladesh, clear
. logit c_use i.urban age i.child3, or nolog
. margins child3, post

Predictive margins                              Number of obs     =      1,934
Model VCE    : OIM

Expression   : Pr(c_use), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      child3 |
          0  |   .3677632   .0153692    23.93   0.000     .3376401    .3978862
          1  |    .432382   .0214472    20.16   0.000     .3903462    .4744177
------------------------------------------------------------------------------

. nlcom (RR: _b[1.child3] / _b[0.child3])

          RR:  _b[1.child3] / _b[0.child3]

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          RR |   1.175708   .0849553    13.84   0.000     1.009198    1.342217
------------------------------------------------------------------------------

. nlcom (log_RR: log(_b[1.child3] / _b[0.child3]))

      log_RR:  log(_b[1.child3] / _b[0.child3])

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      log_RR |   .1618703   .0722589     2.24   0.025     .0202455    .3034951
------------------------------------------------------------------------------

. local b  = exp(r(b)[1,1])
. local lb = exp(r(b)[1,1] - 1.96*sqrt(r(V)[1,1]))
. local ub = exp(r(b)[1,1] + 1.96*sqrt(r(V)[1,1]))

. display %8.6f  `b' " (" `lb' ", " `ub' ")"
1.175708 (1.0204492, 1.3545885)

Tags: None

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2389

03 Apr 2021, 17:32

Yes, the appoach is valid, and you are correct to estimate the log(RR) as you will have better estimates of the confidence interval. You might be interested in reading this article from the Stata Journal which compares a number of methods to estimate a risk ratio. The approach using predictions from a logstic regression model is discussed in Method 6. In particular, the way that you have used nlcom is equivalent to the more verbose use of -predictnl- (see below). I would consider estimating the use of the modified Poisson regression approach (Method 5) which should give you narrower confidence intervals.

Code:

clear *
cls

input byte(receptor stage died total)
0 1 2 12
1 1 5 55
0 2 9 22
1 2 17 74
0 3 12 14
1 3 9 15
end

expand total
drop total
bys receptor stage : gen byte y = cond(_n <= died, 1, 0)
drop died
gen byte low = receptor==0

glm y i.low i.stage , eform link(log) fam(poisson) vce(rob) nolog
scalar rr_p = r(table)["b", "y:1.low"]
scalar rr_p_ll = r(table)["ll", "y:1.low"]
scalar rr_p_ul = r(table)["ul", "y:1.low"]
display "Risk ratio from modified Poisson model = " rr_p " 95% CI = " rr_p_ll ", " rr_p_ul

logit y i.low i.stage, or nolog
predictnl lnrr = ln( sum(invlogit(_b[_cons] + _b[2.stage]*(stage==2) + _b[3.stage]*(stage==3) + _b[1.low])) ///
                     / sum(invlogit(_b[_cons] + _b[2.stage]*(stage==2) + _b[3.stage]*(stage==3))) ), se(se_lnrr)
scalar rr_l = exp(lnrr[_N])
scalar rr_l_ll = exp(lnrr[_N] - invnormal(0.975)*se_lnrr[_N])
scalar rr_l_ul = exp(lnrr[_N] + invnormal(0.975)*se_lnrr[_N])
display "Risk ratio from logit with predictnl = " rr_l " 95% CI = " rr_l_ll ", " rr_l_ul

margins i.low, post
nlcom (lnRR: ln(_b[1.low] / _b[0.low])), post
scalar rr_lm = exp(_b[lnRR])
scalar rr_lm_ll = exp(_b[lnRR] - invnormal(0.975)*_se[lnRR])
scalar rr_lm_ul = exp(_b[lnRR] + invnormal(0.975)*_se[lnRR])
display "Risk ratio from logit with margins = " rr_lm " 95% CI = " rr_lm_ll ", " rr_lm_ul

Res.:

Code:

Risk ratio from modified Poisson model = 1.6307752 95% CI = 1.0733053, 2.4777925
Risk ratio from logit with predictnl = 1.6755988 95% CI = 1.0935712, 2.567397
Risk ratio from logit with margins = 1.6755988 95% CI = 1.0935712, 2.567397

Comment

Goro Yamada

Join Date: Apr 2017

Posts: 6
#3

07 Apr 2021, 08:38

Thank you, Leonardo, for all the info and your quick reply! (and sorry for my late reply...) They are all helpful!

I saw some articles about Poisson with robust variance as well. Eventually, I did not use it since the predicted probabilities exceeded one with my data and I was not comfortable having marginal effects over them.
Comment

Goro Yamada

Join Date: Apr 2017
Posts: 6

13 Apr 2021, 06:55

I realized that average probabilities estimated by margins come with the variance/SE estimation using the delta method. This means that the estimated probabilities can have confidence intervals beyond the [0, 1] boundary.

I thought estimating logits of probability, instead of estimating probabilities directly, and use the logits of probability to estimate RR further. However, the mean of predicted probabilities is not equal to the inverse logit of the mean of the logit of the probability.

Does anyone have an idea to overcome these issues?
1. To estimate probabilities with confidence intervals within [0, 1]
2. To estimate risk ratios with "better" confidence intervals

Code:

. use https://www.stata-press.com/data/r16/bangladesh, clear
 
. ** Estimate marginal propabilities directly *****
. quietly logit c_use i.urban age i.child3, or nolog
 
. *  Marginal probabilities
. margins child3, post

Predictive margins                              Number of obs     =      1,934
Model VCE    : OIM

Expression   : Pr(c_use), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      child3 |
          0  |   .3677632   .0153692    23.93   0.000     .3376401    .3978862
          1  |    .432382   .0214472    20.16   0.000     .3903462    .4744177
------------------------------------------------------------------------------

. quietly nlcom (log_RR: log(_b[1.child3] / _b[0.child3]))
. local b  = exp(r(b)[1,1])
. local lb = exp(r(b)[1,1] - 1.96*sqrt(r(V)[1,1]))
. local ub = exp(r(b)[1,1] + 1.96*sqrt(r(V)[1,1]))

. *  Risk ratio
. display %8.6f  `b' " (" `lb' ", " `ub' ")"
1.175708 (1.0204492, 1.3545885)

 
. ** Estimate marginal logit(p) and get probabilities *****
. quietly logit c_use i.urban age i.child3, or nolog

. *  Marginal probabilities
. quietly margins child3, expression(logit(predict())) post
. local b1  = invlogit(r(b)[1,1])
. local lb1 = invlogit(r(b)[1,1] - 1.96*sqrt(r(V)[1,1]))
. local ub1 = invlogit(r(b)[1,1] + 1.96*sqrt(r(V)[1,1]))
. local b2  = invlogit(r(b)[1,2])
. local lb2 = invlogit(r(b)[1,2] - 1.96*sqrt(r(V)[2,2]))
. local ub2 = invlogit(r(b)[1,2] + 1.96*sqrt(r(V)[2,2]))

. display in y %8.6f "chile3 0:  " `b1' " (" `lb1' ", " `ub1' ")" ///
>  _newline(1) %8.6f "chile3 1:  " `b2' " (" `lb2' ", " `ub2' ")"
chile3 0:  .36487006 (.33444591, .39641342)
chile3 1:  .43119782 (.38868202, .47475254)

. *  Risk ratio
. nlcom (RR: invlogit(_b[1.child3]) / invlogit(_b[0.child3]))

          RR:  invlogit(_b[1.child3]) / invlogit(_b[0.child3])

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          RR |   1.181785   .0882797    13.39   0.000      1.00876     1.35481
------------------------------------------------------------------------------

Announcement

Confidence interval and p-value for relative risk using - margins - after logistic regression model

Comment

Comment

Comment