I am in the process of setting weights for Survival Analysis. The analysis is akin to an RDD estimation where I use a triangular kernel to compute the weights around a cutoff date (I'm splitting starting before and after this date). I am wondering whether my current approach calculates and uses the weights correctly or whether I am messing up in the process. If I am messing up, I would be grateful for pointers to improve since I'm not well versed in survival analysis!
In particular, I am interested in when individuals leave a firm in the years 6-8 of employment.
I first create a normalized time variable and generate a censoring so observations who survive past 8 years don't affect my estimates:
I then calculate the weights:
Then I set my dataset to be survival type and calculate a cox model using some covariates (e.g. female). I checked and ascertained that the covariates are non-missing for all observations. In a LPM I would have used the aw options, but as far as I understand the documentation, stset does not support aw.
The weights w I calculated are between 0 and a maximum of 4%. I have approximately 10k observations in total. The sum of all weights comes out to 254.03 only though (which is provided as the number of observations when Stata reports the results of the cox model), which makes me wonder if I implemented the weights correctly.
I was also wondering whether I can have the stcurve command create 95% C.I.s for the curves, or if there is another way to easily compute 95% C.I.s for the predictions.
Thank you all!
In particular, I am interested in when individuals leave a firm in the years 6-8 of employment.
I first create a normalized time variable and generate a censoring so observations who survive past 8 years don't affect my estimates:
Code:
gen normalized_time = duration-6*12 assert duration >= 6 * 12 // true, I already excluded these observations when I created the dataset gen failure_censored = (duration < 8*12) replace normalized_time = min(normalized_time, 24)
Code:
local bwidth = 24 local cutoff = 0 tempvar h x_l u K_u w bandwidth gen byte `bandwidth' = 1 if inrange(x-`cutoff',-`bwidth',+`bwidth') gen float `h' = `bwidth' gen float `x_l' = 0 gen float `u' = (x-`x_l')/`h' gen float `K_u' = (1-abs(`u')) gen float `w' = 0 if abs(`u')> 1 replace `w' = 1/`h' * `K_u' if abs(`u')<=1
Code:
stset normalized_time [iw=`w], failure(failure_censored) stcox female after stcurve, cumhaz at(after=(0 1))
I was also wondering whether I can have the stcurve command create 95% C.I.s for the curves, or if there is another way to easily compute 95% C.I.s for the predictions.
Thank you all!