Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with plotting survival/hazard rate after Cox Regression

    Hello everybody,

    for my master's thesis I'm conducting a survival analysis using the Cox model. Even though the results of the analysis seem reasonable to me, I've got problems with plotting the hazard and survival function. There must be some difficulties with my data I guess, because the graphs start below the 1.0-mark on the Y-Axis and it stops at .7-mark on the X-Axis (see picture).

    For your understanding: I'm investigating the career duration of professional sportsmen and how many sportsmen survive the observation period of 6 years (X-Axis).


    I would appreciate if someone could help me with my problem. I'm from Germany and new in working with Stata, so please excuse if there are some misspellings.


    Thank you in advance.


    Best regards,
    Early

  • #2
    Hello Early,

    Welcome to the Stata Forum.

    Y and X-axis are reflecting the data you have. You may edit them, but Stata gives you, in general, the most appropriate option.

    In short, after 6 years, you still have more than 70% of the individuals without any event.

    Additionaly, although you did't present the commands, you wrote as a title in the graph: "Cox Proportional Hazard Regression". Are you sure? To me, it seems like the Kaplan-Meier survival estimates. I noticed you mentioned the survivor function in your message, but I fear that changing the title of a graph won't provide what you demanded.

    I suggest you start by checking out the Stata manual on this: http://www.stata.com/manuals13/st.pdf

    Hopefully that helps,

    Best,

    Marcos

    .
    Last edited by Marcos Almeida; 27 Nov 2015, 03:21.
    Best regards,

    Marcos

    Comment


    • #3
      Hey Marcos,

      thank you for your answer. The commands I used were

      .stset yitl, failure (survivor == 0)

      .stcox Pick gplayed gstarted wlratio playoffs ppy rpy apy bpy Steals FGA FTA Turnovers PF WStotal

      .stcurve, survival



      After reading the manual this seemed to me is the way to plot the survival rate after Cox Regression. I have 168 of 348 obervations that survive (Dividing by 6, it means that 28 of 58 people survive). I can't explain myself why 70% didn't experience the event.

      Maybe there's something wrong with the structure of my data. For example, I have my data structured like this:
      Player Position Season Survivor Games played
      1 A 1 1 21
      1 A 2 1 45
      1 A 3 1 44
      1 A 4 1 3
      1 A 5 1 57
      1 A 6 1 34
      2 B 1 0 0
      2 B 2 0 0
      2 B 3 0 2
      2 B 4 0 14
      2 B 5 0 15
      2 B 6 0 15

      Survivor=1 meaning the player survives. So every Player Year is one observation.


      Hopefully these information can help you understanding my problem.


      Thank you!

      Comment


      • #4
        Hello Early,

        I don't understand the reason you chose zero for the event. Also, I wonder if your time variable "yitle" is continuous or discrete. Additionally, I see your date is in long format, that is, you have many rows for the same id. I also wonder why you didn't apply this information when stsetting. What is more, your commands are aimed at displaying the Kaplan-Meier curve. Definitely, I cannot envisage the reason you have to entitle it as "Cox Proportional hazard Regression".
        Last edited by Marcos Almeida; 27 Nov 2015, 08:46.
        Best regards,

        Marcos

        Comment


        • #5
          Hey Marcos,

          Is there any difference if the event = 0 or 1? I chose 0 for the event because survivor=1 sounded to me better.

          - yitl means "Years in the league" and is discrete. It varies between 1 and 6


          I know that I have a long format data, many observations/rows for the same id(player). My professor told me to do this. Is there a problem with this long format? What do i have to do for applying it in the stsetting?

          Stata automatically titld the graph "Cox proportional hazard regression" after I made the ".stcurve, survival" command. After you told me it might be Kaplan-Meier method, I plotted the .sts graph and surprisingly (at least to me) the graph is the correct one.

          Can you tell me what to do know and answer my questions?


          Thank you very much for your support!

          Comment


          • #6
            Some serious problems here.

            First and foremost: You should not be using stcox at all. stcox is intended for continous data and yours is discrete. You should use a discrete method; cloglog is the discrete analogue of stcox.

            See Stephen Jenkins's survival page. Lesson 6 covers discrete data analysis. Section 3.3 of Stephen's book draft covers some of the theory for discrete survival models. Another good source is Mario Cleve's book "An Introduction to Survival Analysis Using Stata".

            • You have too many covariates. Your model is guilty of "overfitting". See this informal introduction. The rule of thumb for Cox mdoels is that you must have 5-10+ events per covariate (Vittinghoff & McCulloch, 2007). You have 30, so you could have fit at most six covariates in the Cox model, perhaps less. The discrete model will be a little more flexible, You will be fitting six duration terms, so perhaps you can fit five extra covariates. You can reduce the number of duration terms if you fit a flexible polynomial to year. See the fp command.

            • For computing standard errors, the player is the unit of analysis. In cloglog add the option, vce(cluster player). In stcox you would have added the option id(player).

            • The curve you were plotting is the predicted survival curve when every covariate is zero. That often results in ridiculous predictions (as when age is in the model and the age range is 50-70 years). You would have had to specify realistic values with the at() option of stcurve. You can accomplish the same thing with margins after cloglog.

            Here's how you would have added a point at zero after stcurve. It 's neat trick I learned from Robert Gutierrez years ago. It may come in handy for plotting with cloglog.

            Code:
            tempfile t1
            stcurve, survival  outfile(`t1', replace)
            use `t1', clear
            keep in 1
            replace surv1=1
            replace _t=0
            append using `t1'
            sum
            sort _t surv1
            scatter surv1 _t, c(stairstep) ms(i)
            Before posting again, please read FAQ 12, which asks that you show all code and results and put them between CODE delimiters.

            Reference:
            Vittinghoff, Eric, and Charles E McCulloch. 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. American journal of epidemiology 165, no. 6: 710-718.
            Last edited by Steve Samuels; 27 Nov 2015, 14:33.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Hey Steve,

              first of all thank you for your answer and your support. Maybe you noticed that I'm not experienced working with Stata. I am not sure if I can just use the cloglog model. My professor told me to use the Cox model, so I don't know what to do know.
              Anyway, I needed to restructure my data as follows:

              Code:
               
              Player Position Experience Season Survivor Games played
              1 A 1 2000 1 21
              1 A 2 2001 1 45
              1 A 3 2002 1 44
              1 A 4 2003 1 3
              1 A 5 2004 1 57
              1 A 6 2005 1 34
              2 B 1 2000 1 13
              2 B 2 2001 1 55
              2 B 3 2002 0 0
              3 C 1 2000 0 0
              So after a player quit, my professor told me to delete the remaining observations. So I have only observations when the player is still in the league. As you can see, Player 1 has 6 obervations and Player 2 only 3.

              I also had the problem that all players survive in the league until they leave, meaning that e.g. Player 2 gets a 1 in column "Survivor" for the first two Seasons and a 0 in Season 3 when he leaves.



              Now I have another question: What ist the difference between stcurve and sts graph?? I don't understand what is the difference of these two commands. When I am using sts graph, it shows me a correct graph whereas stcurve results in the graph I uploaded above.


              I would appreciate it if you can help me, I am really desperate and don't know what to do. (I hope the commands and data are presented correctly. f not, please excuse my mistake).

              Thank you very much!

              Comment


              • #8
                Use the exactp option in stcox., which will theoretically give the best results for tied data. However, the manual states:
                When we view time as discrete, the exact partial method (option exactp) is the final method
                available. This approach is equivalent to computing conditional logistic regression where the groups
                are defined by the risk sets and the outcome is given by the death variable. This is the slowest method
                to use and can take a significant amount of time if the number of tied failures and the risk sets are
                large.
                If that doesn't work, use the efron option.

                stcurve plots predicted (adjusted) curves after regression models, either stcox or streg and, perhaps, some others. sts graph plots individual Kaplan-Meier curves, which have no adjustment. (Well, there is and adjust() option which will adjust for some covariates, but you are not using that.)

                A word of non-statistical advice: As this is for a master's thesis, I think that practice in using the Cox model and graduating is more important than doing a perfect analysis.However,you should mention the issues of discrete data and overfitting in your discussion. It would take a bigger data set than what you've got to do something conclusive. Good luck!
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Finally handed in my thesis. I discussed the results with my professor, he said its sufficient. Thank you all for your help!

                  Comment

                  Working...
                  X