Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Smoothed Hazard function with sts graph

    Hello,

    I am implementing a proportional cox model to investigate the hazard ratio of a second birth, but I am encountering some issues with the plots after using stset and stcox.

    I have first created a Person-period file and estimated h(t) as observed value of Pr(T=t|T>=t) at each time T
    bysort t: egen ht_obs = mean(secondbirth)
    and plotting it against time I obtained a reliable estimate.

    Click image for larger version

Name:	Screen Shot 2017-03-29 at 01.51.04.png
Views:	1
Size:	48.8 KB
ID:	1380784

    However, when I use:
    stset _period, id(id2) failure(birthtest)
    sts graph, hazard

    I obtain a complete different picture, and the same happens when I estimate a cox model with -stcox- and then plot the outcome with -stcurve-.
    Click image for larger version

Name:	Screen Shot 2017-03-29 at 01.50.43.png
Views:	1
Size:	64.4 KB
ID:	1380783Click image for larger version

Name:	Screen Shot 2017-03-29 at 01.50.27.png
Views:	1
Size:	94.5 KB
ID:	1380785

    I am not sure how to interpret these results but most importantly I cannot understand where I got wrong as the -stdes- gives a reliable outcome and the structure of the data does not seem to be an issue per se.

    Thank you very much
    Elena

  • #2
    What you are graphing in your first graph is not the hazard function, it is the survival density function. It is important to remember that these are different. The survival density function gives the probability of an event at a given moment in time (or, at least, is proportional to that). The hazard function is the probability that an event will occur at a given time conditional on its not having occurred before that. So those are quite different. To take a simple example, your probability of dying at 95 is very low: it is far more likely that you will die younger than that, and only a few people survive that long. So the survival density function is low at age 95. But if you do make it to 95, your probability of dying before you turn 96 is pretty high. So your hazard function at age 95 is high.

    As for the second and third graphs, they too, are different things. The graph produced by -sts graph- is the result of a fully non-parametric model. Your -stcox- model, however is semi-parametric and is constrained by the assumption of proportional hazards between the two groups shown in your graph. The fact that the results are rather different suggests to me that the proportional hazards assumption is violated: it is not a good fit to the crude hazard estimate given by -sts graph-. To see this, you could run a different Cox model without any predictors. Syntactically that's illegal, but you can trick Stata into doing it. Try this with your data:

    Code:
    gen one = 1
    stcox one
    stcurve, hazard
    Here there is no proportional hazard assumption binding the estimation, and the hazard function that -stcox- gets looks just like the one you got from -sts graph, hazard-. (They will look a little different, but this is due to the rendering parameters that -stcurve- and -sts graph- uses being different: the functions represented are the same, just scaled slightly differently.

    Anyway, you asked Stata to do three different things, and you got three different results. The only one that's a bit surprising is the way the -stcurve- result turned out--but that actually gave you a lot of information about the proportional hazards assumption not being a good fit to your data!

    Comment


    • #3
      Thank you very much for your kind reply.

      I realised I was not very clear in my question. I am aware those graphs are not supposed to look exactly the same given I am doing three different things.
      However, I would have expected to find similar patterns and I do not understand how to interpret the outcomes of the two last graphs logically. How is it possible I do not obtain any result before t=5? and why is it increasing?

      It is mainly a problem of interpretation to me. Especially given the fact that looking at the literature (and also at common sense), smoothed hazard rates for the birth of a second child follows always a Gaussian shape.

      (I am sorry it is the first time for me working on this model)

      Comment


      • #4
        I would have expected to find similar patterns
        But your expectation is wrong. These three things are so different that there is no reason to expect any strong resemblance between them. What you did cannot be characterized as trying three different ways to do the same thing. You did three completely different things. I might have expected a somewhat closer resemblance between the second and third graphs, but the fact that you didn't get that just says that the proportional hazards model fit by -stcox- does not fit the data well. Did you try the code I showed in #2. It will produce a graph that is nearly identical to your -sts graph, hazard- output (although the axes are scaled somewhat differently). This is further proof that the problem with #3 is that the model is too much of a distortion of the data.

        How is it possible I do not obtain any result before t=5?
        So here you are referring to the -sts graph- output. Because this is a non-parametric model, you will get no results preceding the first observed event. Moreover, some minimum number of events is needed before a hazard can be computed: there has to be a denominator large enough to give a credible estimate. Presumably in your data, that doesn't happen until you hit 5.

        and why is it increasing?
        I'm not sure what your referring to here. The -sts graph- output first decreases and then increases after t = roughly 16. Since this is a non-parametric estimator, I would presume that these increase and decreases are data driven and that this accurately reflects what is going on in your data. The -stcurve- output if synthetic output based on the Cox proportional hazards model. I guess when you constrained it to estimating a model where the hazards are proportional, a very bad fit to your data it seems, this was the best that could be done. Remember that in a parametric model you get the best possible estimates consistent with the constraints imposed by the model. Even the glove that comes closest to fitting on your foot is still not a sock. If the model is just a bad specification of the data generating process, then the predictive outputs like these graphs can be clearly wrong. The old adage about garbage in and garbage out is usually applied to data, but it also applies to models.

        Comment


        • #5
          Dear Clyde,

          I had the same question about the reason why first few observation and last few observations are removed in the graph. In his graphs, both -sts graph- and -stcurve- results in few observation removed. It is my understanding that -stcurve- is a parametric model, so why does this still happen? Not only is the first few observations removed, last few observations are removed as well. I've done a brief search on this, and I believe that it is because of kernel density estimation used to smooth the curve. However, I fail to understand what this really means. Could you perhaps give more explanation on this?

          Comment


          • #6
            You are correct: in the case of -stcurve- the ends of the range of time are truncated due to kernel smoothing. Kernel smoothing entails calculating a weighted average of nearby observations (the exact weighting depends on the particular kernel). But you need to have some nearby observations on both sides of the time point to do it. So a few early and late observations can't be included.

            Comment

            Working...
            X