Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting Stratified Cox Regression

    I estimated a Cox regression. I then ran the post estimation command estat phtest to test the proportional hazard assumption. The results indicated that one of my explanatory variables (a binary variable) violated the assumption. The variable in question was statistically significant within the Cox regression, where a value of 1 increased the hazard rate.

    I re-estimated the Cox Regression, this time stratified by the variable that had violated the proportional hazard assumption. I ran estat phtest and the proportional hazard assumption is no longer violated.

    My question relates to interpretation. My understanding is that with the stratified version of the model the regression coefficients are assumed to be the same for each stratum, although the baseline hazard functions will be different. Do I use the interpretation of the coefficient in the original model to comment on the baseline hazard functions within the stratified model?

    As previously mentioned the variable that violated the proportional hazard assumption in the original model was statistically significant and increased the hazard rate. Would I be able to say therefore that in the stratified version that the baseline hazard will be greater for those with a value of 1 for the stratified variable, relative to those with a value of 0?

    If I cannot make such a claim regarding the nature of the baseline function (beyond stating simple that they are different) is it just a case of evaluating the coefficients in the stratified model in a similar manner to the original, unstratified model?

    Any assistance would be greatly appreciated.

  • #2
    No, you can't rely on the estimated hazard ratio from the first model to say anything about the effect of that variable on the baseline hazards in the stratified model.

    You need to understand the meaning of the proportional hazards assumption. It is not just some technical aspect of a Cox proportional hazards model (and any other survival analysis method that estimates hazard ratios). It is a requirement for the very validity of such a model. The proportional hazards assumption states that the hazard when x = x0 is hr times the hazard when x = x1 (all else being the same) at all times t. If the proportional hazards assumption is violated, then there is no such thing as the hazard ratio associated with x. It means that the ratio of the hazard when x = x0 and x = x1 is not definable: it is not a single number--the ratio of those hazards varies with time.

    If you think of it in terms of ordinary linear regression, which most people find more comprehensible, the proportional hazards model says that there is no interaction between x and t. In a linear regression, the absence of an interaction between predictor variables x and t means that the effect of x does not depend on t (nor the effect of t on x). In fact, one solution to a violation of proportional hazards assumption in a survival analysis is to include an interaction term between x and time (or some function of time.)

    So what you know from the results of your original analysis is that the effect of this variable x on survival cannot be characterized by a single number expressing a hazard ratio. Rather, the effect of x on survival varies over time. It might be steadily increasing, or steadily decreasing, or varying up and down in any imaginable way.

    How does this show up in a stratified analysis: it doesn't, because the baseline hazards estimated are not part of the -stcox- output. So you can't reallly see what's going on. Where you can get a better view of what is going on is by re-running the original model and the -stphplot-ing the curves for that variable. These will lay out the survival data in log-log plots against time and you will be able to visualize if one is always higher than the other, or if the graphs cross one or more times.

    Comment


    • #3
      Clyde,

      Thanks for your help. Just one final matter to check.

      After various changes, I am now left with the KM plot pasted below. The survival functions touch, but do not cross. Is it an issue that the curves touch? I have used the log rank test for the equality of the survival functions. The test confirms that the curves are statistically different from each other. Am I ok to proceed with these plots? My understand is that if they cross then the proportional hazards assumption is violated. Given they do not cross and the test confirms that curves to be statistically different from each other, I assume they are fine, but I just wanted to double check
      Attached Files

      Comment


      • #4
        The log-rank test is non-parametric and does not rely on the proportional hazards assumption. Graphs that touch each other, such as what you show, count as "crossing" and would be reason to discount the results of a hazard ratio-based test. But the log-rank test is OK. The issue with the log rank test is that it does not adjust for potential confounding variables. As you have not really said much about your model, I can't tell if that is a problem for you or not.

        Comment


        • #5
          Hi Clyde,

          Thanks for your reply. By discount the results of a hazard ratio based test, do you mean for example that any results of a Cox model following the estimation the survival functions would be discounted? I don't know if it is of any use by I have attached plot that was generated using Stata's stphplot command.

          My understanding is that the log rank test, given a significant result, indicates that the survival functions are not equal and that the Cox model can be used to identify the variables that explain the differences in the survival time. In terms of details of my model, the model looks at the time to employment for two different groups, controlling for education, gender, ethnicity, social class and some other explanatory variables. Initial results of my Cox regression indicated there was an explanatory variable that violated the proportional hazard assumption. I estimated a stratified Cox regression using this variable, and the results no longer violated the proportional hazards assumption.

          Are there any specific details about my model that you would require to assist with your evaluation of how to proceed?
          Attached Files

          Comment


          • #6
            Well, as I said in #2, the proportional hazards assumption underlies the very meaning of a proportional hazards model. The graph shown in #5 shows the curves approaching and, at the end, intersecting--so it shows a violation of the proportional hazards assumption. It means that the Cox model that the graph represents is mis-specified, so you cannot take its results as valid.

            This is different from the overall test of whether the (unadjusted) survival functions in two groups are the same using the log-rank test: that doesn't require the proportional hazards assumption. Moreover, the fact that the survival curves intersect (touch, in your case, but that counts as intersecting) does not preclude the possibility that with appropriate inclusion of covariates, a proportional hazards model can be properly fit.

            You say that when you stratified the analysis, the proportional hazards violation went away. Since the graph in #5 clearly shows a proportional hazards violation, I suppose this graph came from some other model that is not really in play at this point. But if you have a stratified model with no PH violation, then you can use the results of that model.

            Comment


            • #7
              Hi Clyde,

              See below the tests for the proportional hazards violation. Both sets of test results were generated using estat phtest, detail. The first table shows the results for the unstratified model, the second set for the stratified model. I generated KM plots, but they still touch.

              Code:
               Test of proportional-hazards assumption
              
                    Time:  Time
                    ----------------------------------------------------------------
                                |       rho            chi2       df       Prob>chi2
                    ------------+---------------------------------------------------
                    private_sc~l|      0.02719         1.89        1         0.1694
                    parent_deg~e|      0.01363         0.48        1         0.4900
                    1b.ucas_po~s|            .            .        1             .
                    2.ucas_poi~s|      0.01828         0.87        1         0.3519
                    3.ucas_poi~s|      0.02941         2.24        1         0.1346
                    4.ucas_poi~s|      0.00148         0.01        1         0.9403
                    5.ucas_poi~s|     -0.00682         0.12        1         0.7300
                    1b.deg_sub  |            .            .        1             .
                    2.deg_sub   |     -0.00830         0.18        1         0.6754
                    3.deg_sub   |      0.01004         0.26        1         0.6107
                    1b.degree_~s|            .            .        1             .
                    2.degree_c~s|      0.03133         2.55        1         0.1103
                    3.degree_c~s|      0.01727         0.78        1         0.3778
                    1b.social_~s|            .            .        1             .
                    2.social_c~s|     -0.01213         0.38        1         0.5375
                    3.social_c~s|     -0.01576         0.64        1         0.4222
                    male        |     -0.01126         0.33        1         0.5683
                    white       |      0.02216         1.30        1         0.2547
                    first_job   |     -0.19788        99.07        1         0.0000
                    home        |      0.00385         0.04        1         0.8463
                    ------------+---------------------------------------------------
                    global test |                    119.54       16         0.0000
                    ----------------------------------------------------------------
              Code:
              Test of proportional-hazards assumption
              
                    Time:  Time
                    ----------------------------------------------------------------
                                |       rho            chi2       df       Prob>chi2
                    ------------+---------------------------------------------------
                    private_sc~l|      0.02746         1.92        1         0.1656
                    parent_deg~e|      0.01087         0.30        1         0.5818
                    1b.ucas_po~s|            .            .        1             .
                    2.ucas_poi~s|      0.01704         0.75        1         0.3857
                    3.ucas_poi~s|      0.02896         2.17        1         0.1404
                    4.ucas_poi~s|      0.00140         0.01        1         0.9435
                    5.ucas_poi~s|     -0.00541         0.07        1         0.7844
                    1b.deg_sub  |            .            .        1             .
                    2.deg_sub   |     -0.01020         0.26        1         0.6071
                    3.deg_sub   |      0.00986         0.25        1         0.6175
                    1b.degree_~s|            .            .        1             .
                    2.degree_c~s|      0.03280         2.79        1         0.0946
                    3.degree_c~s|      0.01946         0.99        1         0.3206
                    1b.social_~s|            .            .        1             .
                    2.social_c~s|     -0.01383         0.49        1         0.4825
                    3.social_c~s|     -0.01787         0.83        1         0.3628
                    male        |     -0.01089         0.30        1         0.5809
                    white       |      0.02397         1.51        1         0.2184
                    home        |      0.00202         0.01        1         0.9191
                    ------------+---------------------------------------------------
                    global test |                     14.08       15         0.5195
                    ----------------------------------------------------------------

              Comment


              • #8
                The second set of phtest results looks just fine.

                And I didn't look closely enough at the graph in #3. The only places these graphs touch is that sometimes the vertical parts of the step functions overlap. That's not a problem. Crossing and touching refers to the possibility of the horizontal segments of one curve being sometimes above and other times below or overlying the corresponding horizontal segments of the other curve. That does not happen in your graph. The vertical segments themselves are, in a sense, not really a part of the K-M graph: they just connect the horizontal parts, which are the real survival function.

                Comment


                • #9
                  Hi Clyde,

                  Thanks for taking a 2nd look at the graph in #3, and for confirming that there is no problem with the graph. I have one final question, I promise...

                  I ran a slightly different version of the model, which indicated two variables that violated the proportional hazards assumption

                  Code:
                  Time:  Time
                        ----------------------------------------------------------------
                                    |       rho            chi2       df       Prob>chi2
                        ------------+---------------------------------------------------
                        private_sc~l|      0.02661         3.07        1         0.0796
                        parent_deg~e|      0.00868         0.32        1         0.5707
                        1b.ucas_po~s|            .            .        1             .
                        2.ucas_poi~s|      0.00510         0.11        1         0.7360
                        3.ucas_poi~s|      0.00444         0.09        1         0.7692
                        4.ucas_poi~s|      0.00498         0.11        1         0.7414
                        5.ucas_poi~s|     -0.00849         0.31        1         0.5757
                        rus         |     -0.04782         9.94        1         0.0016
                        1b.deg_sub  |            .            .        1             .
                        2.deg_sub   |     -0.01935         1.62        1         0.2033
                        3.deg_sub   |      0.01577         1.08        1         0.2993
                        1b.degree_~s|            .            .        1             .
                        2.degree_c~s|      0.01195         0.63        1         0.4289
                        3.degree_c~s|      0.00381         0.06        1         0.8006
                        1b.social_~s|            .            .        1             .
                        2.social_c~s|      0.01502         0.99        1         0.3208
                        3.social_c~s|      0.00978         0.42        1         0.5189
                        first_job   |     -0.23736       241.91        1         0.0000
                        male        |      0.01124         0.55        1         0.4586
                        white       |      0.02651         3.10        1         0.0783
                        home        |     -0.00493         0.10        1         0.7468
                        ------------+---------------------------------------------------
                        global test |                    267.63       17         0.0000
                        ----------------------------------------------------------------
                  I re-ran the Cox regression, this time interacting the variables that violated the PH assumption with time using the tvc option

                  Code:
                  -------------------------------------------------------------------------------
                              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  ---------------+----------------------------------------------------------------
                  main           |
                  private_school |   .9460533     .04612    -1.14   0.255     .8598439    1.040906
                   parent_degree |   .9215259   .0308524    -2.44   0.015     .8629976    .9840235
                                 |
                     ucas_points |
                              2  |   .9152593   .0450049    -1.80   0.072     .8311684    1.007858
                              3  |   .9045419   .0449305    -2.02   0.043     .8206306    .9970332
                              4  |   .8757148   .0479293    -2.42   0.015     .7866382    .9748781
                              5  |   .8871063   .0601861    -1.77   0.077     .7766504    1.013271
                                 |
                             rus |   1.380769   .1037072     4.30   0.000      1.19176    1.599754
                                 |
                         deg_sub |
                              2  |   .9432655    .033391    -1.65   0.099      .880039    1.011035
                              3  |   1.153952   .0581104     2.84   0.004     1.045498    1.273657
                                 |
                    degree_class |
                              2  |   1.077586   .0574071     1.40   0.161     .9707454    1.196186
                              3  |    1.03692   .0586564     0.64   0.522     .9280997      1.1585
                                 |
                    social_class |
                              2  |   .9363917   .0671793    -0.92   0.360     .8135607    1.077768
                              3  |    .952336   .0653782    -0.71   0.477     .8324438    1.089496
                                 |
                       first_job |     3.6624   .2757894    17.24   0.000      3.15986    4.244864
                            male |   .8943365   .0286741    -3.48   0.000     .8398657      .95234
                           white |   1.143792   .0447718     3.43   0.001     1.059323    1.234997
                            home |   1.003441   .0404809     0.09   0.932      .927156    1.086004
                  ---------------+----------------------------------------------------------------
                  tvc            |
                             rus |   .9732549   .0086609    -3.05   0.002      .956427    .9903789
                       first_job |   .8677089   .0081348   -15.14   0.000     .8519105    .8838003
                  --------------------------------------------------------------------------------
                  Note: Variables in tvc equation interacted with _t.
                  I can no longer run the estat phtest after using the tvc command. I'm using Stata 14 if that makes any difference? Does use of the tvc option automatically deal with the PH assumption violation?

                  Lastly in terms of interpretation am I correct in saying that at t = 0 the HR for Rus is 1.38, and is multiplied by 0.973 for each one unit increase in t?

                  Thank you again for all of your assistance so far

                  Comment


                  • #10
                    Yes, that interpretation is correct.

                    Comment


                    • #11
                      When estimating the results as I did in #9, should the time varying covariates be included in the main equation with the non-time varying variables i.e. in my results should rus and first_job be removed from the main equation? I ask as in the Stata manual for stcox the time varying variables are only included in the tvc portion of the model, but in other examples I have seen the variables included in both the main and tvc equations.

                      Lastly, if I include a time varying term in both the main and tvc equation, how would I interpret the findings if the variable in the main coefficient is insignificant, but is significant in the tvc equation? I assume it would be at t = 0 the variable has no effect on the HR, but that given a significant covariate in the tvc equation, the effect on the HR is significant as time progresses. Is this correct?

                      Comment


                      • #12
                        If you are using the -tvc()- option to do this, then you should only list the time-varying covariates there. The interpretation of the results depends on what, if anything, you specified in the -texp()- option.

                        While your second question is no longer germane, because you should omit the variables from the "main" equation, I would just point out that a not significant result does not mean no effect. I know you have probably been (mis)taught that because the concept of statistical significance is badly taught almost everywhere. But a not significant result simply means that the data are consistent with no effect, but they will also be consistent with some range of non-zero effects (look at the confidence interval).

                        Comment


                        • #13
                          Thanks for your help again Clyde. I have re-estimated the model, omitting the tvc's from the main equation. I didn't specify anything in the texp() option beyond texp(_t) which multiplies the covariates by the analysis time, which in the case of my model is expressed in months. With that in mind can you explain how to interpret the hazard ratio for one of my tvc's?

                          Code:
                          ------------------------------------------------------------------------------
                                      _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          ---------------+----------------------------------------------------------------
                          main           |
                          private_school |   .9832712   .0176947    -0.94   0.349     .9491946    1.018571
                           parent_degree |   .9529882    .012383    -3.71   0.000     .9290244    .9775702
                                         |
                             ucas_points |
                                      2  |   1.015198   .0204845     0.75   0.455     .9758331    1.056152
                                      3  |   1.021556    .021012     1.04   0.300     .9811922     1.06358
                                      4  |   1.006205   .0230444     0.27   0.787     .9620371      1.0524
                                      5  |   1.010588   .0273132     0.39   0.697     .9584483    1.065564
                                         |
                            social_class |
                                      2  |    1.07088   .0320669     2.29   0.022     1.009839    1.135611
                                      3  |   1.071677   .0306827     2.42   0.016     1.013196    1.133533
                          ---------------+----------------------------------------------------------------
                          tvc            |
                                     rus |   1.010631   .0016183     6.60   0.000     1.007465    1.013808
                                stem_deg |   1.016506   .0015442    10.78   0.000     1.013484    1.019537
                             good_degree |    1.01854    .002493     7.51   0.000     1.013665    1.023438
                               first_job |   1.004111   .0015029     2.74   0.006      1.00117    1.007061
                                    male |   .9953505   .0014482    -3.20   0.001     .9925161    .9981929
                                   white |   1.006487   .0018459     3.53   0.000     1.002876    1.010111
                          --------------------------------------------------------------------------------
                          Note: Variables in tvc equation interacted with _t.

                          Comment


                          • #14
                            So, when -stcox- runs, it is actually estimating log hazard ratios, not hazard ratios directly (those are calculated after the fact by exponentiating the results.) So what we have, using rus as an example) is log hr = 1.0106*rus*_t. So hr = exp(1.0106*rus*_t). Assuming rus is a 0/1 dichotomous variable, for rus = 1 (vs 0 as the comparator) hr = exp(1.0106*_t), so we start out with a hazard ratio of exp(0) = 1 at time _t =0, and this hr grows exponentially thereafter. At time _t = 1, we would have hr = exp(1.0106*1) = exp(1.0106) = approx. 2.75. And so on.

                            Comment


                            • #15
                              Thanks Clyde. I wasn't aware of the point regarding log hazard ratios. Looking at the stata manual for stcox, it doesn't seem to mention this distinction. Can the log HR's still be discussed with respect to <1 reducing the hazard and >1 increasing the hazard, or must they always be exponentiated? Most examples I have seen just seem to take the HR produced by Stata and use it for interpretation with no mention of any subsequent calculations applied to Stata's HR's

                              With regards to the example post #14, is the same calculation/interpretation applied with respect to tvc's where the log hazard ratio is less than 1, as it is with male (0.995)? Does this mean that this will also grow exponentially too such that the initial reduced hazard associated with being male, becomes an increased hazard over time?

                              Sorry if this is confusing to ask. The log hazard vs hazard ratio point has thrown me off course

                              Comment

                              Working...
                              X