Predicted probabilities from Cox, logistic, and poisson

Dannie Zarate

Join Date: Jul 2014

Posts: 6
#1

Predicted probabilities from Cox, logistic, and poisson

07 Jul 2014, 18:05

hi

I'm generating predicted probabilities of death from different multivariable models, as inputs to average attributable fraction calculation described here (http://www.biomedcentral.com/1471-2288/9/7/).

I'm choosing between logistic, Cox (stcox), and Poisson regression models, and I'm leaning towards Cox because it's faster and I use it to estimate relative risk rather than odds ratios.

However the models produce slightly different predicted probabilities, with the logistic model predicting higher probabilities than the Cox model, as shown in the graph below. Also the sum of Cox predicted probabilities slightly exceed the total observed number of deaths (e.g. 265 observed deaths versus 265.14 as sum of Cox predicted p).

Which model prediction is more accurate?

Thanks
Dannie

*** Logistic
logistic death $factors, or
predict p_logistic

*** Modified Poisson with robust error variance
glm death $factors, fam(poisson) link(log) nolog vce(robust) eform
predict p_poisson

*** Cox, ref. Cummings 2009. Stata Journal 9(2): 175
gen time = 1
stset time, failure(death)

cap drop basesurv
stcox $factors, hr breslow vce(robust) nolog basesurv(basesurv)
cap drop xb
cap drop p_cox
qui predict xb, xb
qui gen p_cox = 1 - (basesurv^exp(xb))

Attached Files
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3421
#2

09 Jul 2014, 01:20

That depends on which model fits the data better. Since we don't have your data, we cannot answer that question.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Dannie Zarate

Join Date: Jul 2014

Posts: 6
#3

15 Jul 2014, 00:48

Thanks Maarten. I guess that is the question: is the difference (between Cox- and logistic-predicted probabilities) partly a function (or idiosyncrasy) of the data? Or is there reason to expect the Cox model to always underestimate the probability compared to the logistics model?

Last edited by Dannie Zarate; 15 Jul 2014, 01:01.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3421
#4

15 Jul 2014, 01:35

From your own graph you can see that the predicted probabilities from a Cox model aren't always lower than the predicted probabilities from a logit model. Note that lower predicted probabilities does not necessarily mean underestimation, it could just as well mean that the logit moder overestimates the probabilities.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

16 Jul 2014, 09:22

it's always dangerous to assume that a method intended for one purpose will be good for another.

Cummings (2009) used a Cox hazard ratio model with single time "t" for all observations to get an adjusted relative risk (RR) for a binary data problem. He said nothing about using the results to get predicted risks. This is not surprising, because the Cox model assumes a hazard ratio (HR) model to generate the risks. More exactly for this case, Breslow's method, used by Cummings, assumes a baseline exponential model over the interval 0-1.

The following code takes a 0-1 x variable, and shows that Cumming's method does indeed reproduce the RR (= 1.5) but can't come very close to the actual risks. In fact, the Cox predictions have RR = 1.20.

So, to generate predictions for binary data, method intended for such data. In Stata, logistic is one; cloglog is another. Predictions from either would match the crude risks in the example data.

References:

Breslow, N. 1974. Covariance Analysis of Censored Survival Data. Biometrics 30, no. 1: 89-99.
Cummings, Peter. 2009. Methods for estimating adjusted risk ratios. Stata Journal 9, no. 2: 175.

Code:

clear set obs 100 gen id = _n gen x = id>50 gen t = 1 gen d = id<=30 | (id>=51 & id<=95) tab x d, row // Notice RR = .9/.6 = 1.5 stset t, fail(d) stcox x // Reproduces HR = 1.5 ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.5 .3535534 1.72 0.085 .9450638 2.380792 ------------------------------------------------------------------------------ . predict basesurv, basesurv . predict xb, xb . gen pcox = 1- basesurv^exp(xb) . table x, c(mean pcox) ---------------------- x | mean(pcox) ----------+----------- 0 | .6904075 1 | .8277395 ---------------------- . table x, c(mean d) ---------------------- x | mean(d) ----------+----------- 0 | .6 1 | .9 ---------------------- .

Last edited by Steve Samuels; 16 Jul 2014, 10:07.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

16 Jul 2014, 11:25

The last paragraph of text should have been: So, to generate predictions for binary data, use a method intended for such data. In Stata, logistic is one; cloglog is another. Predictions from either would match the crude risks in the example data.

Steve

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

16 Jul 2014, 13:47

Correction: Stata doesn't use Nathan Breslow's formula for estimating the survival curve in a Cox model (Breslow, 1974, p. 93, Eq. 7), only his method for handling ties in the partial-likelihood equations. Stata's formula for the survival curve is shown in the Methods and Formula's section of the manual entry for stcox postestimation. The same conclusion applies: neither one is suitable for generating predictions for binary data.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Dannie Zarate

Join Date: Jul 2014

Posts: 6
#8

16 Jul 2014, 18:13

Thank you Steve, that was really helpful! (also useful technique for testing models in other contexts).

I'm changing the attributable fraction algorithm to use the Cox model to estimate relative risks, then get predictions from logistics model.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

16 Jul 2014, 20:34

You are very welcome, Dannie. You are still stuck with the fact that the predicted probabilities from logistic are not consistent with the RRs from stcox- not easy to justify! For a past case-control study, I used Bruzzi's method, discussed in the BMC Biomedical Research paper you reference. Now, if it were my problem, I would use the average Attributable Fraction (AF), especially as the authors link to Stata routines (which I have not tried) for its calculation.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Dannie Zarate

Join Date: Jul 2014
Posts: 6

#10

16 Jul 2014, 21:47

The BMC paper's Stata routine for average AF is, in fact, what I'm trying to re-code (the BMC version loads the entire dataset into a matrix and performs all of the calculations there, but this is not feasible on my dataset with 300k records).

I don't understand your last comment though. I ran a logistics model on your example and the logit-predicted probabilities yield the same RR as Cox's HR, i.e.

Code:

------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.5 .3535534 1.72 0.085 .9450638 2.380792
------------------------------------------------------------------------------

. logistic d x, or
. predict plog
------------------------------------------------------------------------------
           d | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |          6   3.316625     3.24   0.001     2.030635    17.72844
       _cons |        1.5   .4330127     1.40   0.160     .8518645    2.641265
------------------------------------------------------------------------------

. table x, c(mean d mean pcox mean plog)
----------------------------------------------
        x |    mean(d)  mean(pcox)  mean(plog)
----------+-----------------------------------
        0 |         .6    .6904075          .6
        1 |         .9    .8277395          .9
----------------------------------------------

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#11

16 Jul 2014, 22:23

I don't understand your last comment though. I ran a logistics model on your example and the logit-predicted probabilities yield the same RR as Cox's HR, i

Quite right-I was mistaken.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Predicted probabilities from Cox, logistic, and poisson

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment