Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Relative Survival

    Hi all,

    I am currently working on relative survival and using Paul Dickman methods. I ran stset and ran strs after that. Everything looks fine and I got the results. Now I am trying to dig in how I got to that results by doing step by step instead running "strs" i.e. work out all the varlist in strs such as n,_d, start, end , w , p , n_prime etc... However, I don't get the same as the results as I ran the strs. I compared what I got the the saved file when ran the strs, so far I got some varibles match to the saved file such as n,_d,start,end. I don't get the same value of p. I guess there must me something wrong with the way I calcuate the w (censored variable).

    Here are all the code I produced the variables


    local period "0/1"
    local lastyearfollowup = 2011
    local diagnosis_endyear = 2011
    local diagnosis_startyear = 1982



    stset dateofdeath, origin(dateofdiagnosis) enter(time mdy(1,1,2007)) exit(time mdy(12,31,2011)) fail(event==1) id(id) scale(365.25)

    stsplit start,at("`period'") trim

    keep if _st == 1

    gen attage = int( age + start )

    gen attyear = int( yeardiagnosis + start )

    gen _year = attyear

    gen _age = attage

    sort _year sex _age

    merge _year sex _age using QldLifeTable, nokeep nolabel keep(prob)

    keep if _merge==3 /* Keep only if observations exists in both files */
    drop _merge


    gen end = .

    forval starttime = `period' {

    local endtime = `starttime' + `periodinterval'

    di "start = `starttime' : end = `endtime'"

    replace end = `endtime' if round(float(start),0.0001) == round(float(`starttime'),0.0001)

    }


    gen d = _d /* Indicator for dead in the interval */


    /* Generate a censoring indicator (not the same as 'not dead') */
    bysort id (start) : gen w = ( d[_N] == 0 & _n == _N & (_t-_t0) != ( end-start ) )

    gen y = (_t - _t0) /* Person-time at risk in the interval */

    gen d_star = -ln(prob)* y // Expected deaths (using exact times)

    gen p_star = prob^(end-start) // interval-specific expected survival


    bysort id (_t0) : replace p_star = prob^(end-_t0) if _n==_N //last row of each id

    gen nu = ( d - d_star)/y // Estimated excess mortality rate, (d-d_star)/y

    collapse (sum) d w y d_star (count) n = d (mean) p_star end, by(`bygroup' start)

    gen n_prime = n - w/2 // number at risk

    gen ns = n_prime - d // number surviving at end of interval

    gen nu = ( d - d_star )/y // Estimated excess mortality rate, (d-d_star)/y

    local w w

    gen p = 1 - (d/n_prime)

    /* Cumulative observed survival */
    /* Updated v1.2.5 (16may2007) to correct bug when p=0 (thanks to John Condon and Arun Pokhrel) */
    // p = 0 if everybody dies in the interval, therefore interval-specific survival = 0%, so ln(0) = ERROR

    by sex : gen cp = exp( sum( ln(p) ) ) if p ~= 0
    replace cp = 0 if p == 0


    /* Cumulative expected survival */

    by `bygroup' : gen cp_e2 = exp(sum(ln(p_star)))


    /* Cumulative relative survival */

    gen cr_e2=cp/cp_e2

    gen RelSurv_`survivalperiod'_yr = cr_e2

    This results obtained by running strs
    start end n d d_star y p se_p lo_p hi_p p_star r se_r lo_r hi_r cp se_cp lo_cp hi_cp cp_e2 cr_e2 se_cr_e2 lo_cr_e2 hi_cr_e2 yeardiagnosis nu
    0 1 18243 1521 177.5 8841.8 0.842 0.0037 0.8345 0.8491 0.98 0.8591 0.0038 0.8515 0.8664 0.842 0.0037 0.8345 0.8491 0.98 0.8591 0.0038 0.8515 0.8664 2006 0.151954
    0 1 21375 4105 387.3 18806.9 0.8039 0.0027 0.7985 0.8092 0.9777 0.8223 0.0028 0.8167 0.8277 0.8039 0.0027 0.7985 0.8092 0.9777 0.8223 0.0028 0.8167 0.8277 2007 0.197677
    0 1 22433 4210 410.9 19799.8 0.8085 0.0026 0.8032 0.8136 0.9775 0.8271 0.0027 0.8217 0.8323 0.8085 0.0026 0.8032 0.8136 0.9775 0.8271 0.0027 0.8217 0.8323 2008 0.191876
    0 1 23036 4258 409.4 20368.4 0.8114 0.0026 0.8062 0.8164 0.9782 0.8295 0.0027 0.8242 0.8346 0.8114 0.0026 0.8062 0.8164 0.9782 0.8295 0.0027 0.8242 0.8346 2009 0.18895
    0 1 23256 4311 411.3 20579.3 0.811 0.0026 0.8059 0.816 0.9782 0.8291 0.0026 0.8238 0.8342 0.811 0.0026 0.8059 0.816 0.9782 0.8291 0.0026 0.8238 0.8342 2010 0.189498
    0 1 23551 2575 219.6 10623.3 0.7847 0.0037 0.7773 0.792 0.9781 0.8023 0.0038 0.7947 0.8097 0.7847 0.0037 0.7773 0.792 0.9781 0.8023 0.0038 0.7947 0.8097 2011 0.22172
    This results obtained by work out each varlist (step by step)
    start d w y d_star n p_star end n_prime ns nu p cp cp_e2 cr_e2 RelSurv_1_yr yeardiagnosis
    0 1521 16722 8841.756 177.4596 18243 0.989358 1 9882 8361 0.151954 0.846084 0.846084 0.989358 0.855185 0.855185 2006
    0 4105 0 18806.93 387.3008 21375 0.977664 1 21375 17270 0.197677 0.807953 0.807953 0.977664 0.826412 0.826412 2007
    0 4210 0 19799.8 410.8893 22433 0.977459 1 22433 18223 0.191876 0.81233 0.81233 0.977459 0.831063 0.831063 2008
    0 4258 0 20368.43 409.3769 23036 0.978157 1 23036 18778 0.18895 0.815159 0.815159 0.978157 0.833362 0.833362 2009
    0 4311 27 20579.27 411.2643 23256 0.97819 1 23242.5 18931.5 0.189498 0.814521 0.814521 0.97819 0.832682 0.832682 2010
    0 2575 20976 10623.32 219.5979 23551 0.978126 1 13063 10488 0.22172 0.802878 0.802878 0.978126 0.820833 0.820833 2011


    Thanks for the help.



  • #2
    Hi Nancy,

    This shows that it's always a good idea to do some digging rather than to just accept what is given! I think that the difference in the estimates can be traced back to your stset command. If you use the enter option in stset, then strs will estimate the observed interval-specific survival (p) using a transformation of the hazard rather than from the actuarial equations you are using relating to the survival.strs does warn you about this - mentioning something relating to delayed entry, but this is not as well documented as I thought it would be when I just checked the help file. You can match the answers in strs by calculating p using p=exp(-d/y). strs does this because of the left truncation that you are introducing by using the enter option. In your example, those patients diagnosed in 2006 do not begin being at risk at time 0. Therefore, the -n- that you are using doesn’t relate to those at risk right from the start of the interval. Instead, we use the risktime accumulated over the interval, -y-, and calculate the survival based on the mortality rate over the interval. Both approaches will usually be similar, but it makes more sense to use the hazard in the case of delayed entry.

    Some further details are given in this document relating to strs (hopefully added as a link).

    I would have a think about how you plan to stset depending on what analysis you intend to do here - it looks as though you are comparing short-term survival across years of diagnosis, but that may just be as an example.

    Hope this helps,

    Mark.
    http://pauldickman.com/rsmodel/stata_colon/standard_errors.pdf
    Last edited by Mark Rutherford; 07 Jul 2014, 07:34. Reason: http://pauldickman.com/rsmodel/stata_colon/standard_errors.pdf

    Comment


    • #3
      Hi Mark.

      This is really help. I applied p=exp(-d/y) instead p = 1-(d/n_prime) and I did get the results exactly as the results from strs.
      Thanks for your help Mark.

      Nancy

      Comment


      • #4
        Just one more thought: It may be better to use finer break points in the strs command when using this hazard transformed approach - for example, breaks(0(0.1)1). This approach assumes constant hazards within an interval, and it may therefore be better to split more finely because the hazard will be changing quite quickly early on in follow-up (usually). If you do use intervals different from 1 year, the appropriate formula for p will then be p=exp(-k*d/y) where k is the length of the interval (say 0.1 years). It is also possible to use finer breaks early on then move back to the yearly breaks for longer term follow-up.
        Last edited by Mark Rutherford; 08 Jul 2014, 01:48.

        Comment

        Working...
        X