Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival analysis - data management query

    Dear all,

    I have two variables in my dataset namely "date of birth" "date of death" and would like to do survival analysis. First of all, I am not a statistician so bear with me please.

    1st variable codebook:
    Code:
        Type: Numeric daily date (int)
    
                     Range: [13880,21182]                 Units: 1
           Or equivalently: [01jan1998,29dec2017]         Units: days
             Unique values: 1,062                     Missing .: 0/1,174
    
                      Mean: 17165.3 = 30dec2006(+ 7 hours)
                 Std. dev.:    2083
               Percentiles:       10%        25%        50%        75%        90%
                                14408      15411    17030.5      18864      20241
                            13jun1999  12mar2002  17aug2006  25aug2011  02jun2015


    2nd variable codebook:
    Code:
     Type: Numeric daily date (int)
    
                     Range: [13894,21239]                 Units: 1
           Or equivalently: [15jan1998,24feb2018]         Units: days
             Unique values: 180                       Missing .: 994/1,174
    
                      Mean: 17588.6 = 26feb2008(+ 13 hours)
                 Std. dev.: 2006.76
               Percentiles:       10%        25%        50%        75%        90%
                              14964.5    15987.5    17325.5      19290    20459.5
                            20dec2000  09oct2003  08jun2007  24oct2012  06jan2016
    I would like to make basic graphs with 95% CI and regression analysis table later on but I do not know how to make it happen. Any tips to convert the dates to practical variables (e.g., age in months or years) would be much appreciated. Also, I know how to run a multivariate regression based on OR but with survival data I am not sure which command should I go for.

    Thanks in advance

  • #2
    Presuming the dates are in Stata format (which they seem),

    gen ageatdeath = date of death - date of birth

    should work. Again, I am a novice at Stata and neither a statistician either. Hopefully, I am right.

    Comment


    • #3
      Originally posted by Girish Venkataraman View Post
      Presuming the dates are in Stata format (which they seem),

      gen ageatdeath = date of death - date of birth

      should work. Again, I am a novice at Stata and neither a statistician either. Hopefully, I am right.
      Thank you it helped. Now I have the age in days for each observations. However, when I make a histogram the shape is not skewed to the right (I expected it to be as most of patients survived), it is more like a U shaped. I am trying to find a way to make most of those survived to be counted in each and every day since the beginning.

      Comment


      • #4
        If I'm understanding correctly, 200 died and 994 were still alive. For your survival analysis you will need these 3 variables:

        1. Start date
        2. End date
        3. Event indicator

        Everyone has a start date (date of birth), but you need an end date. The end date will be date of death for those who died and date of last follow-up for those who are still alive. You'll then need an event indicator, e.g., the variable dead takes the value 1 for those who died and zero for those who did not.

        Next step is to learn -stset-. Your command will be something like:

        Code:
        stset exit_date, failure(dead) enter(time start_date) scale(365.24)
        This will create several internal variables, one of which will be _t (age at exit in years). I think it's preferable to use -stset- to create the time variables rather than generate.

        Once you have stset the data, the most common commands for analysis are -sts graph- (Kaplan-Meier curves) and stcox (regression). I'll leave you to read the manual or one of the many excellent tutorials on survival analysis using Stata.

        Comment


        • #5
          Originally posted by Paul Dickman View Post
          I'll leave you to read the manual or one of the many excellent tutorials on survival analysis using Stata.
          For what it's worth, I found this book particularly helpful:
          Cleves, M., Gould, W.W., and Marchenko, Y.V. (2016). An Introduction to Survival Analysis Using Stata (Revised 3rd ed.). College Station, TX: Stata Press.
          David Radwin
          Senior Researcher, California Competes
          californiacompetes.org
          Pronouns: He/Him

          Comment


          • #6
            thank you guys for your help, much appreciated, happy analysis

            Comment

            Working...
            X