Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival Analysis taking into account out-migration

    Hello Everyone,
    I am performing a survival analysis on health and demographic surveillance data. The design of the system is that an individual enters into the system by birth or in-migration and exits the system either through death or out-migration. An individual can enter into the system and exit the system multiple times. An example is an individual who enters the system on 1st January 2015 and exits on the 5th September 2015 and then enters the system again on the 1st January 2019 and exits on the 20th September 2019. I applied the stset command in Stata, but I noticed the person years have been over-calculated. For instance, in the above example, I was expecting total person-years of around 352 but it gives me 1720, that is adding the person-years from September 2015 to September 2019 whereas the person was out of the system during those period. Please how do I set my data set to take into account the out-migration period?

    Please see below is the the code I used and a sample of the data set after the code

    Stata Code : stset enddate,failure(event)id(NewID)origin(startdate) exit(time .)

    NewID startdate startyr enddate endtype year _st _d _t _t0
    1 1-Jan-15 2012 15-Mar-15 OMG 2015 1 0 73 0
    1 1-Jan-19 2015 17-Sep-19 OMG 2019 1 0 1720 73
    2 1-Jan-15 2013 15-Mar-15 OMG 2015 1 0 73 0
    2 1-Jan-19 2015 17-Sep-19 OMG 2019 1 0 1720 73
    3 1-Jan-17 2012 15-Sep-17 OMG 2017 1 0 257 0
    3 18-Jun-20 2020 31-Dec-20 NA 2020 1 0 1460 257
    4 15-Jun-16 2016 31-Dec-16 NA 2016 1 0 199 0
    4 1-Jan-17 2016 31-Dec-17 NA 2017 1 0 564 199
    4 1-Jan-19 2016 31-Dec-19 NA 2019 1 0 1294 929
    Thank you very much for your help

  • #2
    I'm very confused by your data, since you have variables named startyr and year that don't correspond to the year components of startdate or enddate--what are those?

    Turning to your question, you need to use the -time0()- option, not the -origin()- option, in your -stset- command in order for Stata to recognize that there are gaps in the observation period. -origin()- refers to the time when the person first becomes at risk for the failure event--and it should have the same value in every observation for that person. The -time0()- option specifies a variable giving the beginning date of the current period of observation. By using that, Stata will see that there is a gap between enddate in one observation and time0() in the next observation.

    Comment


    • #3
      Thank you very much for this, it now works very well.

      Comment

      Working...
      X