Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival Analysis Model Test

    Hi everyone,

    for a large datatset I want to test whether I can conduct a Weibull proportional hazard model. Thus I undertook a Schoenfeld residual test (which should tell whether proportional hazards apply) using the following commands

    stset time, failure(event) scale(1)
    stcox c1
    estat phtest, detail

    The result is as follows
    Click image for larger version

Name:	Schoenfeld.PNG
Views:	1
Size:	4.7 KB
ID:	1423195



    If I Interpret this correctly I could not assume proportional hazards because I will reject the null hypothesis of a slope=0

    However, if I plot the log(-log(S)) vs. log(t) by using command

    stphplot, by(c1)

    this gives me two (more or less) parallel lines which would suggest that using Weibull PH model is fine
    Click image for larger version

Name:	plot.PNG
Views:	1
Size:	15.2 KB
ID:	1423196



    Do I have a misunderstanding of these concepts or are the two tests really giving opposite results? Could I infer from the second graphics that applying Weibull PH model is fine?

    Thank you very much

  • #2
    Hi Nina,

    Whether you should use a parametric (e.g. Weibull) model or semi-parametric Cox PH model is different from testing the assumption of proportionality (proportional hazards assumption). This assumption applies equally to both models, so tests for PH will not indicate whether you should use a parametric model or not. A parametric model makes an assumption about the baseline hazard, namely, that it can be modelled using a distribution, such as the exponential, Weibull, or others. Deciding to use a parametric model has many other considerations, including knowledge of the disease over the time-frame you observed, among other things. You can try fitting exponential or Weibull models, compare them with the Cox model to see how the hazard ratios are affected.

    Coming back to the PH assumption, there are many good ways to handle violations, but the most straightforward in Stata is to use the tvc function, which allows your covariates to interact with time. This can account for time-dependent effects that are manifesting as non-PH.

    Hope this helps a bit.

    Edit: To provide a little more commentary, a significant regression coefficient (log-HR) does not provide evidence or insight into the PH assumption. You are interpreting the stphplot graph correct. The PH assumption does not seem to be violated. Assuming all important variables have been accounted for, whether to use Cox PH or Weibull PH, is a different consideration (discussed above, in brief).
    Last edited by Matt Warkentin; 20 Dec 2017, 15:53.

    Comment


    • #3
      Oh I have just seen that I took the wrong graphic which caused some misunderstanding.
      Click image for larger version

Name:	Schoenfeld.PNG
Views:	1
Size:	6.3 KB
ID:	1423216

      This is the Output I got from the Schoenfeld test. And this is the one suggesting that I cannot assume a Proportional Hazard model.

      Does anyone know where this presaumably contradiction comes from?

      Thank you!

      Comment


      • #4
        Nina,

        My initial comment stands true. So whether or not there is violation of the PH assumption should not be a determining factor for whether to use a parametric Weibull model or Cox PH model.

        That aside, this is always a tough decision to face. Graphically it looks like minimal to no violation of PH, but a single test of significance provides some evidence for violation of PH. What to do? Well if you're concerned about non-PH, you could do as I mentioned in my previous post and include time-dependent effects using the Stata tvc suboption and see how it affects your model. If the effect is minimal, a more parsimonious model without tvc could enhance easy interpretation. But to address your more salient question, why does one approach suggest violation, while the other does not? This is hard to say. One consideration could be your sample size. By that I mean if the sample size is very large, almost any test will become statistically significant as there is more data to find evidence of violations.

        Regarding sample size as a driving force for a test of significance. This is easy to demonstrate. See below. The distribution on the right is Gaussian according to Stata's sktest, while the one on the left is not Gaussian according to the same test. This may seem counterintuitive as the distribution on the left follows the Normal curve almost perfectly, while the right figure does not. This shouldn't be a surprise now, but the sample size for the right figure is N=100, while the one on the left is N=100,000. More data means more evidence, even if the practical implications are meaningless.
        Click image for larger version

Name:	normal.png
Views:	1
Size:	124.3 KB
ID:	1423225Click image for larger version

Name:	non-normal.png
Views:	1
Size:	121.7 KB
ID:	1423226
        Last edited by Matt Warkentin; 20 Dec 2017, 17:51.

        Comment


        • #5
          Thank you very much Matt! This was very helpful

          Comment

          Working...
          X