Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kaplan Meier Curve

    Hi, I've been struggling with a Kaplan Meier survival curve and can't seem to find a solution anywhere. I have mortality data extending over an 11 year period. I began recording death events 24 months after an initial survey. The range of total duration of the survey is between 120 months and 135 months. stset appears to be set up correctly but when I try to graph it I have a flat line at the top of the curve until the shortest duration time (120 months) and then death events are recorded on the graph. Why would this be happening? Any thoughts would be greatly appreciated, thank you!

  • #2
    Hello Kelly,

    Welcome to the Stata Forum!

    I believe you will get insightful advice provided you display a summary of your data, the commands as well as the output, as indicated in the FAQ. I kindly ask you to take a look at the them (just by clicking on the link above to the left) and type the necessary information accordingly.

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      To add to Marcos's, request, be sure to read FAQ 12 and to put code and output between CODE delimiters. In your case, we need to see your stset statement.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Hi, thank you so much to both of you, I apologize for my novice status!

        I am working in a confidential lab, no internet access, so I cannot at this time post a screen shot of my graph, but I can tell you that I used a (in Stata 13.1 on Windows 7)
        stset enddate, time0(surveydate) origin (time surveydate) failure(death_event) exit (time enddate) - Where enddate refers to the end of the observation period (a specific date - 31dec2011).

        My ouput had no probable error message, the total observations and failures were represented correctly. I detected no problems with the output.

        I used a sts graph command and in the graph, which I deeply wish I could show you here, there is a flat line at the top of the graph and death events began their stepwise descent at 120 months. In my dataset death events occur 24 months following the surveydate.

        I would be more than happy to provide more information but will have to have a vetting period first. Thank you so much! I look forward to any and all replies.

        Best,
        Kelly

        Comment


        • #5
          Kelly:
          if your data are confidential, can't you post, from the same internet access that you used for your previous messages, a faked dataset that mimicks your problem?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Please in future, put commands and results between CODE delimiters, as requested in the FAQ. There's obviously something about your dates and events that produces this phenomenon, but we'll need more information to know what, e.g. earliest date of entry, maximum length of time of followup, exactly how the 24 month delay comes about (chance?). time0 is to be used only with multiple record data, if at all. If that's what you have, then stset must contain an id() option.

            In any case, we'd need detail about the study design. We don't need a graph, image, but we do need some statistics or even a reduced data listing (via dataex, SSC)) with random IDs and randomly perturbed dates, if necessary.

            Last edited by Steve Samuels; 18 Jan 2016, 11:06.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Hi, thank you so much for your speedy replies! Sorry about the delimiters, I misunderstood what that meant, I will correct that for future posts.

              As for internet access, I'm posting from my office, not from the data lab, so I created a fake dataset. I tried to create a dataset with staggered entry into the survey, however in my real dataset I include observations that began 2 years after entry into the survey and my follow-up time is 11 years. (Dataset attached in Word document)



              In Stata 13.1 using Windows 7 I used
              Code:
               stset enddate, time0(surveydte) origin (time surveyate) failure (Death_Event) exit (time enddate)
              I used the command time0 because I thought that would bring the staggered entry dates to 0 and I used the command origin to distinguish entry into the observation period (I used the command enter also and received the same output)

              Output:

              failure event: Death_Event != 0 & Death_Event < .
              obs. time interval: (surveydte, enddate]
              exit on or before: time enddate
              t for analysis: (time-origin)
              origin: time surveydte

              ------------------------------------------------------------------------------
              21 total observations
              0 exclusions
              ------------------------------------------------------------------------------
              21 observations remaining, representing
              15 failures in single-record/single-failure data
              17871 total analysis time at risk and under observation
              at risk from t = 0
              earliest observed entry t = 0
              last observed exit t = 898


              .
              I used
              Code:
               sts graph
              Output:

              failure _d: Death_Event
              analysis time _t: (enddate-origin)
              origin: time surveydte
              exit on or before: time enddate



              Graph attached in Word document

              Again I'm having the same issue where death_events which first occur 216 days after entry into the survey are not showing up on the graph. This is not the best sample dataset but hopefully it shows what's been happening.

              Thank you so much, I really appreciate the responses!

              Best,
              Kelly
              Attached Files
              Last edited by Kelly Renwick; 18 Jan 2016, 12:52.

              Comment


              • #8
                I can't extract your data, which is some kind of image in the Word document.. Please reread the last five paragraphs of FAQ 12. list the data, as asked, and upload the graph in a universally readable format. I'll just repeat the Manual: the time0 option is not necessary. Also, you haven't answered my question about the 24 months delay period.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Hi Steve, you are very patient, thank you! I tried pasting a screen shot and a graph in an earlier post but was unsuccessful. I've read through the FAQ and am still a bit unclear on how best to pass on this information. I have uploaded the dataset and the graph as a .png. Hoping this works.

                  Per the 24 month delay, I am intentionally excluding observations within the first 24 months in order exclude respondents who were ill at the time of the survey.

                  Just to reiterate, I am using Stata 13.1 in Windows 7 and am using
                  Code:
                   stset enddate, time0(surveydte) origin (time surveydte) failure (Death_Event) exit (time enddate)
                  then I am using
                  Code:
                   sts graph
                  And the graph is showing a flat line at the top where I expected to see the stepwise death events. Thank you so much for bearing with me!

                  Best,
                  Kelly
                  Attached Files

                  Comment


                  • #10
                    Kelly:
                    no wonder that you experience what you report, as, according to your _t, deaths occurr in a very narrow time-window, from 806 onwards:
                    Code:
                    . sum _t
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                              _t |         21         851    34.49638        806        898
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Thanks so much to everyone who has taken the time to help me with this, I was finally able to sort out what I did wrong, in Stata 13.1 in Windows 7, rather than using
                      Code:
                       stset enddate, time0(surveydte origin (time surveydte) failure(Death_Event) exit (time enddate)
                      I used
                      Code:
                       stset deathdate, origin(time surveydte) failure (Death_Event) exit (time enddate)
                      and I was able to use
                      Code:
                       sts graph
                      and create an appropriate graph

                      I believe I was thrown by the probable error I received in my output but I see it's not a true error

                      lure event: Death_Event != 0 & Death_Event < .
                      obs. time interval: (origin, deathdate]
                      exit on or before: time enddate
                      t for analysis: (time-origin)
                      origin: time surveydte

                      ------------------------------------------------------------------------------
                      21 total observations
                      6 event time missing (deathdate>=.) PROBABLE ERROR
                      ------------------------------------------------------------------------------
                      15 observations remaining, representing
                      15 failures in single-record/single-failure data
                      8108 total analysis time at risk and under observation
                      at risk from t = 0
                      earliest observed entry t = 0
                      last observed exit t = 853


                      Thanks everyone!

                      Best,
                      Kelly
                      Attached Files

                      Comment


                      • #12
                        Well, I'm glad you're making progress. But I think what you have is still incorrect. I think that "PROBABLE ERROR" message is right.

                        First let me check if I understand your posted data. Subjects become at risk as of surveydte and remain under observation until the earlier of deathdate (if they day), or enddate (if they survive all the way to enddate). If that is correct, the 6 observations for which Death_Event == 0 are censored observations, but your -stset- command fails to account for them. I think what you need is this:

                        Code:
                        gen event_date = enddate
                        replace event_date = deathdate if Death_Event == 1
                        
                        stset event_date, origin(surveydte) failure(Death_Event == 1)
                        sts graph
                        If you look at your graph, you see that it says that there are no survivors at 853 days. But in your data that is not true: there are 6 survivors. The graph generated by my code correctly shows that there are, indeed, residual survivors at the end. The output from -stset- also reflects this. Your code notes 15 observations because there are 6 missing event times. My code notes 21 observations, of which 15 are failures.

                        Comment


                        • #13
                          Hi Clyde, It works! Thank you so much! I'm sure it's obvious I am new to this and often get hung up with these types of issues, this has been a great learning experience. Your help is greatly appreciated!

                          Best,
                          Kelly

                          Comment


                          • #14
                            Even with Clyde's fix, the KM curve will show 100% survival for the first 24 months. This is theoretically invalid. Why? The KM Curve evaluated at time \(t\), (\(S(t)\)), by definition estimates the following proportion: Of people at risk at the start date (time zero), which we may call the "cohort", what is the fraction still alive at \(t\)?

                            Your curve doesn't do this. You have restricted the analysis to people who survived to 24 months, then treated them as the entire cohort at risk at \(t = 0\). In effect, you are basing the definition of your cohort at \(t=0\) on information about outcome known only 24 months after the start of the study.

                            There is a simple solution for analyzing survival of those alive at 24 months:

                            First plot the KM curve for the original cohort, with risk starting at date of survey. This will help you and the reader gauge the effect of excluding 24 months experience.

                            Second, do what is known as a landmark analysis; re-estimate the KM curve starting at 24 months (~720 days) among people still alive and under study at that point. You can restart the time-scale at zero:
                            Code:
                            stset event_date, origin(event_date==720) failure(Death_Event == 1)
                            sts graph
                            or, with a little extra effort, you can start the plot at 720 days.
                            Code:
                            stset event_date, origin(event_date==720) failure(Death_Event == 1)
                            sts gen km = s
                            gen t720 = _t + 720  //shift time
                            // Add left hand point to start KM at 1
                            expand 2 in 1
                            replace t720 = 720 in 1
                            replace km = 1 in 1
                            sort t720
                            scatter km t270, c(stairstep) ms(i)
                            Last edited by Steve Samuels; 20 Jan 2016, 09:14.
                            Steve Samuels
                            Statistical Consulting
                            [email protected]

                            Stata 14.2

                            Comment


                            • #15
                              Thank you Steve! This is enormously helpful, I would not have discovered this on my own and I look forward to applying this to my real dataset. I feel very fortunate to have been able to participate in the Stata forum, a wonderful place to expand my knowledge

                              All the best,
                              Kelly

                              Comment

                              Working...
                              X