Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicted probabilities from Cox, logistic, and poisson

    hi

    I'm generating predicted probabilities of death from different multivariable models, as inputs to average attributable fraction calculation described here (http://www.biomedcentral.com/1471-2288/9/7/).

    I'm choosing between logistic, Cox (stcox), and Poisson regression models, and I'm leaning towards Cox because it's faster and I use it to estimate relative risk rather than odds ratios.

    However the models produce slightly different predicted probabilities, with the logistic model predicting higher probabilities than the Cox model, as shown in the graph below. Also the sum of Cox predicted probabilities slightly exceed the total observed number of deaths (e.g. 265 observed deaths versus 265.14 as sum of Cox predicted p).

    Which model prediction is more accurate?

    Thanks
    Dannie



    *** Logistic
    logistic death $factors, or
    predict p_logistic


    *** Modified Poisson with robust error variance
    glm death $factors, fam(poisson) link(log) nolog vce(robust) eform
    predict p_poisson


    *** Cox, ref. Cummings 2009. Stata Journal 9(2): 175
    gen time = 1
    stset time, failure(death)

    cap drop basesurv
    stcox $factors, hr breslow vce(robust) nolog basesurv(basesurv)
    cap drop xb
    cap drop p_cox
    qui predict xb, xb
    qui gen p_cox = 1 - (basesurv^exp(xb))



    Click image for larger version

Name:	GraphPNG.png
Views:	2
Size:	21.9 KB
ID:	56416
    Attached Files

  • #2
    That depends on which model fits the data better. Since we don't have your data, we cannot answer that question.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks Maarten. I guess that is the question: is the difference (between Cox- and logistic-predicted probabilities) partly a function (or idiosyncrasy) of the data? Or is there reason to expect the Cox model to always underestimate the probability compared to the logistics model?
      Last edited by Dannie Zarate; 15 Jul 2014, 02:01.

      Comment


      • #4
        From your own graph you can see that the predicted probabilities from a Cox model aren't always lower than the predicted probabilities from a logit model. Note that lower predicted probabilities does not necessarily mean underestimation, it could just as well mean that the logit moder overestimates the probabilities.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          it's always dangerous to assume that a method intended for one purpose will be good for another.

          Cummings (2009) used a Cox hazard ratio model with single time "t" for all observations to get an adjusted relative risk (RR) for a binary data problem. He said nothing about using the results to get predicted risks. This is not surprising, because the Cox model assumes a hazard ratio (HR) model to generate the risks. More exactly for this case, Breslow's method, used by Cummings, assumes a baseline exponential model over the interval 0-1.

          The following code takes a 0-1 x variable, and shows that Cumming's method does indeed reproduce the RR (= 1.5) but can't come very close to the actual risks. In fact, the Cox predictions have RR = 1.20.


          So, to generate predictions for binary data, method intended for such data. In Stata, logistic is one; cloglog is another. Predictions from either would match the crude risks in the example data.

          References:

          Breslow, N. 1974. Covariance Analysis of Censored Survival Data. Biometrics 30, no. 1: 89-99.
          Cummings, Peter. 2009. Methods for estimating adjusted risk ratios. Stata Journal 9, no. 2: 175.



          Code:
          clear
          set obs 100
          gen id = _n
          gen x = id>50
          gen t = 1
          gen d = id<=30 | (id>=51 & id<=95)
          tab x d, row  // Notice RR = .9/.6 = 1.5
          stset t, fail(d)
          stcox x   // Reproduces HR = 1.5
          
          ------------------------------------------------------------------------------
                    _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                     x |        1.5   .3535534     1.72   0.085     .9450638    2.380792
          ------------------------------------------------------------------------------
          . predict basesurv, basesurv
          . predict xb, xb
          
          . gen pcox = 1- basesurv^exp(xb)
          
          . table x, c(mean pcox)
          
          ----------------------
                  x | mean(pcox)
          ----------+-----------
                  0 |   .6904075
                  1 |   .8277395
          ----------------------
          
          . table x, c(mean d)
          
          ----------------------
                  x |    mean(d)
          ----------+-----------
                  0 |         .6
                  1 |         .9
          ----------------------
          
          .
          Last edited by Steve Samuels; 16 Jul 2014, 11:07.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            The last paragraph of text should have been: So, to generate predictions for binary data, use a method intended for such data. In Stata, logistic is one; cloglog is another. Predictions from either would match the crude risks in the example data.

            Steve
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Correction: Stata doesn't use Nathan Breslow's formula for estimating the survival curve in a Cox model (Breslow, 1974, p. 93, Eq. 7), only his method for handling ties in the partial-likelihood equations. Stata's formula for the survival curve is shown in the Methods and Formula's section of the manual entry for stcox postestimation. The same conclusion applies: neither one is suitable for generating predictions for binary data.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Thank you Steve, that was really helpful! (also useful technique for testing models in other contexts).

                I'm changing the attributable fraction algorithm to use the Cox model to estimate relative risks, then get predictions from logistics model.

                Comment


                • #9
                  You are very welcome, Dannie. You are still stuck with the fact that the predicted probabilities from logistic are not consistent with the RRs from stcox- not easy to justify! For a past case-control study, I used Bruzzi's method, discussed in the BMC Biomedical Research paper you reference. Now, if it were my problem, I would use the average Attributable Fraction (AF), especially as the authors link to Stata routines (which I have not tried) for its calculation.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    The BMC paper's Stata routine for average AF is, in fact, what I'm trying to re-code (the BMC version loads the entire dataset into a matrix and performs all of the calculations there, but this is not feasible on my dataset with 300k records).

                    I don't understand your last comment though. I ran a logistics model on your example and the logit-predicted probabilities yield the same RR as Cox's HR, i.e.

                    Code:
                    ------------------------------------------------------------------------------
                    _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    x | 1.5 .3535534 1.72 0.085 .9450638 2.380792
                    ------------------------------------------------------------------------------
                    
                    . logistic d x, or
                    . predict plog
                    ------------------------------------------------------------------------------
                               d | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                               x |          6   3.316625     3.24   0.001     2.030635    17.72844
                           _cons |        1.5   .4330127     1.40   0.160     .8518645    2.641265
                    ------------------------------------------------------------------------------
                    
                    . table x, c(mean d mean pcox mean plog)
                    ----------------------------------------------
                            x |    mean(d)  mean(pcox)  mean(plog)
                    ----------+-----------------------------------
                            0 |         .6    .6904075          .6
                            1 |         .9    .8277395          .9
                    ----------------------------------------------

                    Comment


                    • #11
                      I don't understand your last comment though. I ran a logistics model on your example and the logit-predicted probabilities yield the same RR as Cox's HR, i
                      Quite right-I was mistaken.
                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment

                      Working...
                      X