In any survival analysis data set there are two ways of looking at time. One is time as we measure it on a calendar or watch. Let's call that calendar time. And the other is analysis time. Analysis time is calculated in -stset- and stored in variable _t. For each person, _t = 0 at the time when the person first comes under observation in the study. This time is specified in -stset-'s -origin()- option. It then runs at a rate determined by -stset-'s -scale()- option.
So, in your situation, calendar time has different starting points for each contract, and it runs in units of days. Analysis time starts at 0 on the start date of each contract and runs in units of years. The code for which you are asking explanations is used to enable us to tell -stsplit- what to do using analysis time, which is what -stsplit- understands most easily.
Now, the -stsplit- command looks like this: -stsplit era, at(`interval1' `interval2') after(_t = era1_start_t)-. So we have to understand what interval1, interval2 and era1_start_t are. Let's work backwards from right to left.
-gen era1_start_t = datediff_frac(startcontdate, td(16jun2010), "y")- creates a new variable, era1_start_t defined as the time interval in years from the start date of the contract until 16 jun 2010--the latter being the day before the start of era 1. (The reason why I didn't use 17 jun 2010, the actual start date, will become clear shortly.) In other words, it is the value of _t, for the particular contract, on 16 jun 2010.
-local interval2 = datediff_frac(td(16jun2010), td(11jan2012), "y")- define a local macro containing the time interval in years from 16jun2010 to 11jan2012, the latter being the start date of era 2.
-local interval1 = 1/365.25- defines local macro interval 1 to be 1/365.25, which is the duration of 1 day measured in years.
So when we run -stsplit era, at(`interval1' `interval2') after(_t = era1_start_t)-, we are asking Stata to split each observation into eras, the first of which ends 1 day after 16jun2010, i.e. on 17jun2010. It might have been clearer to set `interval1' to 0 and have era1_start_t be 17jun2010 itself--but unfortunately, the -at()- observation requires positive numbers.
The next era begins `interval2' years after era1_start_date. Since interval2 is defined as the number of years between 11jan2012 and 16jun2010, and era1_start_date is the number of years from the contract start date to 16jun2010, this means that the next era begins on 11jan2012.
So the net effect of this has been to provide -stsplit- with the times at which to split the observations in the analysis time metric.
Finally we have -by cont_id (era), sort: replace era = _n-1-. -stsplit- defines the variable era to be the analysis time values of the times at which it has defined the new eras. But what you need for your analysis is not a variable containing different values for each contract, and those values being counts and fractions of years, but a discrete 1, 2, 3 variable, this command changes it. The observations are sorted in chronological order within cont_id and then the first has era replaced by 1, the second by 2, and the third by 3.
I must say I am puzzled by the results you are getting, and I do not understand where things are going wrong.
So, in your situation, calendar time has different starting points for each contract, and it runs in units of days. Analysis time starts at 0 on the start date of each contract and runs in units of years. The code for which you are asking explanations is used to enable us to tell -stsplit- what to do using analysis time, which is what -stsplit- understands most easily.
Now, the -stsplit- command looks like this: -stsplit era, at(`interval1' `interval2') after(_t = era1_start_t)-. So we have to understand what interval1, interval2 and era1_start_t are. Let's work backwards from right to left.
-gen era1_start_t = datediff_frac(startcontdate, td(16jun2010), "y")- creates a new variable, era1_start_t defined as the time interval in years from the start date of the contract until 16 jun 2010--the latter being the day before the start of era 1. (The reason why I didn't use 17 jun 2010, the actual start date, will become clear shortly.) In other words, it is the value of _t, for the particular contract, on 16 jun 2010.
-local interval2 = datediff_frac(td(16jun2010), td(11jan2012), "y")- define a local macro containing the time interval in years from 16jun2010 to 11jan2012, the latter being the start date of era 2.
-local interval1 = 1/365.25- defines local macro interval 1 to be 1/365.25, which is the duration of 1 day measured in years.
So when we run -stsplit era, at(`interval1' `interval2') after(_t = era1_start_t)-, we are asking Stata to split each observation into eras, the first of which ends 1 day after 16jun2010, i.e. on 17jun2010. It might have been clearer to set `interval1' to 0 and have era1_start_t be 17jun2010 itself--but unfortunately, the -at()- observation requires positive numbers.
The next era begins `interval2' years after era1_start_date. Since interval2 is defined as the number of years between 11jan2012 and 16jun2010, and era1_start_date is the number of years from the contract start date to 16jun2010, this means that the next era begins on 11jan2012.
So the net effect of this has been to provide -stsplit- with the times at which to split the observations in the analysis time metric.
Finally we have -by cont_id (era), sort: replace era = _n-1-. -stsplit- defines the variable era to be the analysis time values of the times at which it has defined the new eras. But what you need for your analysis is not a variable containing different values for each contract, and those values being counts and fractions of years, but a discrete 1, 2, 3 variable, this command changes it. The observations are sorted in chronological order within cont_id and then the first has era replaced by 1, the second by 2, and the third by 3.
I must say I am puzzled by the results you are getting, and I do not understand where things are going wrong.
Comment