Relative Survival

Nancy

Join Date: Jul 2014
Posts: 7

06 Jul 2014, 20:08

Hi all,

I am currently working on relative survival and using Paul Dickman methods. I ran stset and ran strs after that. Everything looks fine and I got the results. Now I am trying to dig in how I got to that results by doing step by step instead running "strs" i.e. work out all the varlist in strs such as n,_d, start, end , w , p , n_prime etc... However, I don't get the same as the results as I ran the strs. I compared what I got the the saved file when ran the strs, so far I got some varibles match to the saved file such as n,_d,start,end. I don't get the same value of p. I guess there must me something wrong with the way I calcuate the w (censored variable).

Here are all the code I produced the variables

local period "0/1"
local lastyearfollowup = 2011
local diagnosis_endyear = 2011
local diagnosis_startyear = 1982

stset dateofdeath, origin(dateofdiagnosis) enter(time mdy(1,1,2007)) exit(time mdy(12,31,2011)) fail(event==1) id(id) scale(365.25)

stsplit start,at("`period'") trim

keep if _st == 1

gen attage = int( age + start )

gen attyear = int( yeardiagnosis + start )

gen _year = attyear

gen _age = attage

sort _year sex _age

merge _year sex _age using QldLifeTable, nokeep nolabel keep(prob)

keep if _merge==3 /* Keep only if observations exists in both files */
drop _merge

gen end = .

forval starttime = `period' {

local endtime = `starttime' + `periodinterval'

di "start = `starttime' : end = `endtime'"

replace end = `endtime' if round(float(start),0.0001) == round(float(`starttime'),0.0001)

}

gen d = _d /* Indicator for dead in the interval */

/* Generate a censoring indicator (not the same as 'not dead') */
bysort id (start) : gen w = ( d[_N] == 0 & _n == _N & (_t-_t0) != ( end-start ) )

gen y = (_t - _t0) /* Person-time at risk in the interval */

gen d_star = -ln(prob)* y // Expected deaths (using exact times)

gen p_star = prob^(end-start) // interval-specific expected survival

bysort id (_t0) : replace p_star = prob^(end-_t0) if _n==_N //last row of each id

gen nu = ( d - d_star)/y // Estimated excess mortality rate, (d-d_star)/y

collapse (sum) d w y d_star (count) n = d (mean) p_star end, by(`bygroup' start)

gen n_prime = n - w/2 // number at risk

gen ns = n_prime - d // number surviving at end of interval

gen nu = ( d - d_star )/y // Estimated excess mortality rate, (d-d_star)/y

local w w

gen p = 1 - (d/n_prime)

/* Cumulative observed survival */
/* Updated v1.2.5 (16may2007) to correct bug when p=0 (thanks to John Condon and Arun Pokhrel) */
// p = 0 if everybody dies in the interval, therefore interval-specific survival = 0%, so ln(0) = ERROR

by sex : gen cp = exp( sum( ln(p) ) ) if p ~= 0
replace cp = 0 if p == 0

/* Cumulative expected survival */

by `bygroup' : gen cp_e2 = exp(sum(ln(p_star)))

/* Cumulative relative survival */

gen cr_e2=cp/cp_e2

gen RelSurv_`survivalperiod'_yr = cr_e2

This results obtained by running strs
start	end	n	d	d_star	y	p	se_p	lo_p	hi_p	p_star	r	se_r	lo_r	hi_r	cp	se_cp	lo_cp	hi_cp	cp_e2	cr_e2	se_cr_e2	lo_cr_e2	hi_cr_e2	yeardiagnosis	nu
0	1	18243	1521	177.5	8841.8	0.842	0.0037	0.8345	0.8491	0.98	0.8591	0.0038	0.8515	0.8664	0.842	0.0037	0.8345	0.8491	0.98	0.8591	0.0038	0.8515	0.8664	2006	0.151954
0	1	21375	4105	387.3	18806.9	0.8039	0.0027	0.7985	0.8092	0.9777	0.8223	0.0028	0.8167	0.8277	0.8039	0.0027	0.7985	0.8092	0.9777	0.8223	0.0028	0.8167	0.8277	2007	0.197677
0	1	22433	4210	410.9	19799.8	0.8085	0.0026	0.8032	0.8136	0.9775	0.8271	0.0027	0.8217	0.8323	0.8085	0.0026	0.8032	0.8136	0.9775	0.8271	0.0027	0.8217	0.8323	2008	0.191876
0	1	23036	4258	409.4	20368.4	0.8114	0.0026	0.8062	0.8164	0.9782	0.8295	0.0027	0.8242	0.8346	0.8114	0.0026	0.8062	0.8164	0.9782	0.8295	0.0027	0.8242	0.8346	2009	0.18895
0	1	23256	4311	411.3	20579.3	0.811	0.0026	0.8059	0.816	0.9782	0.8291	0.0026	0.8238	0.8342	0.811	0.0026	0.8059	0.816	0.9782	0.8291	0.0026	0.8238	0.8342	2010	0.189498
0	1	23551	2575	219.6	10623.3	0.7847	0.0037	0.7773	0.792	0.9781	0.8023	0.0038	0.7947	0.8097	0.7847	0.0037	0.7773	0.792	0.9781	0.8023	0.0038	0.7947	0.8097	2011	0.22172





This results obtained by work out each varlist (step by step)
start	d	w	y	d_star	n	p_star	end	n_prime	ns	nu	p	cp	cp_e2	cr_e2	RelSurv_1_yr	yeardiagnosis
0	1521	16722	8841.756	177.4596	18243	0.989358	1	9882	8361	0.151954	0.846084	0.846084	0.989358	0.855185	0.855185	2006
0	4105	0	18806.93	387.3008	21375	0.977664	1	21375	17270	0.197677	0.807953	0.807953	0.977664	0.826412	0.826412	2007
0	4210	0	19799.8	410.8893	22433	0.977459	1	22433	18223	0.191876	0.81233	0.81233	0.977459	0.831063	0.831063	2008
0	4258	0	20368.43	409.3769	23036	0.978157	1	23036	18778	0.18895	0.815159	0.815159	0.978157	0.833362	0.833362	2009
0	4311	27	20579.27	411.2643	23256	0.97819	1	23242.5	18931.5	0.189498	0.814521	0.814521	0.97819	0.832682	0.832682	2010
0	2575	20976	10623.32	219.5979	23551	0.978126	1	13063	10488	0.22172	0.802878	0.802878	0.978126	0.820833	0.820833	2011

Thanks for the help.

Tags: None

Mark Rutherford

Join Date: Jul 2014

Posts: 7
#2

07 Jul 2014, 07:33

Hi Nancy,

This shows that it's always a good idea to do some digging rather than to just accept what is given! I think that the difference in the estimates can be traced back to your stset command. If you use the enter option in stset, then strs will estimate the observed interval-specific survival (p) using a transformation of the hazard rather than from the actuarial equations you are using relating to the survival.strs does warn you about this - mentioning something relating to delayed entry, but this is not as well documented as I thought it would be when I just checked the help file. You can match the answers in strs by calculating p using p=exp(-d/y). strs does this because of the left truncation that you are introducing by using the enter option. In your example, those patients diagnosed in 2006 do not begin being at risk at time 0. Therefore, the -n- that you are using doesn’t relate to those at risk right from the start of the interval. Instead, we use the risktime accumulated over the interval, -y-, and calculate the survival based on the mortality rate over the interval. Both approaches will usually be similar, but it makes more sense to use the hazard in the case of delayed entry.

Some further details are given in this document relating to strs (hopefully added as a link).

I would have a think about how you plan to stset depending on what analysis you intend to do here - it looks as though you are comparing short-term survival across years of diagnosis, but that may just be as an example.

Hope this helps,

Mark.

http://pauldickman.com

http://pauldickman.com/rsmodel/stata_colon/standard_errors.pdf

Last edited by Mark Rutherford; 07 Jul 2014, 07:34. Reason: http://pauldickman.com/rsmodel/stata_colon/standard_errors.pdf
Comment
Nancy

Join Date: Jul 2014

Posts: 7
#3

07 Jul 2014, 14:37

Hi Mark.

This is really help. I applied p=exp(-d/y) instead p = 1-(d/n_prime) and I did get the results exactly as the results from strs.
Thanks for your help Mark.

Nancy
Comment
Mark Rutherford

Join Date: Jul 2014

Posts: 7
#4

08 Jul 2014, 01:46

Just one more thought: It may be better to use finer break points in the strs command when using this hazard transformed approach - for example, breaks(0(0.1)1). This approach assumes constant hazards within an interval, and it may therefore be better to split more finely because the hazard will be changing quite quickly early on in follow-up (usually). If you do use intervals different from 1 year, the appropriate formula for p will then be p=exp(-k*d/y) where k is the length of the interval (say 0.1 years). It is also possible to use finer breaks early on then move back to the yearly breaks for longer term follow-up.

Last edited by Mark Rutherford; 08 Jul 2014, 01:48.
Comment

Announcement

Relative Survival

Comment

Comment

Comment