Dear Statalisters!
I'm conducting a model development study, in which I'm trying to predict recurrence of endometrial cancer in the presence of death (from any cause) prior to recurrence as a competing risk. I had initially over 80 candidate predictors (the data is a follow-up from a previous study with another aim). I performed event-per-parameter calculations using pmsampsize, and based on that (and subject matter knowledge) I've selected a handful of candidate predictors.
Since I'm more interested in prediction, rather than etiology, I've opted for a Fine-and-Gray model (instead of cause-specific modelling)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5698744/
Since I have some continuous candidate predictors, I've also opted for a MFPA-algoritm to obtain fractional polynominals and simultaneously do backwards elimination of redundant candidate predictors.
https://mfp.imbi.uni-freiburg.de/mfp and https://pubmed.ncbi.nlm.nih.gov/29398977/
Now, here are my problem(s).
The MFPA-algorithm is very computationally intensive. In my full dataset (1429 obs.), one run takes about 7 minutes, which would be fine if it weren't for the need of later assessing internal validation. For internal validation, with bootstrap as an example, the entire model development process must be repeated in each bootstrap sample, and the number of bootstrap samples should be high (100 to 1000 possibly). So this part would take days! Even with fewer resampling datasets, say k-fold cross validation, it will take hours. Only a single random split would be computationally efficient, but that is not recommended due to low power (and as I described above, the number of candidate predictors were based on an EPP-calculation which assumed that the full dataset would be available for model development).
I've come across a work-around, stcrprep by Paul C Lambert https://pubmed.ncbi.nlm.nih.gov/30542252/ (PMID: 30542252). This command expands the dataset and assign weights, and is based on crprep in R.
Here is an example dataset with 10% (143 obs.) of my original dataset drawn without replacement:
First, I try with native Stata commands, so that I will have "accurate" output to compare later:
the stcurve produces this image:

and then with stcrprep and stcox:
and the graph looks like this:

We see that the SHRs, z-scores, p-values and all of the output is identical. So the stcrprep-->stcox seems to be working. But the graphs don't look the same.
I e-mailed Paul C Lambert and he replied :
So I get that they are not the same, so it should not be surprising that the graphs don't look the same. But which is the correct one to calculate and report in my paper? Which would you "normally" report when the aim is to develop a Fine-and-Gray prediction model of recurrence of cancer?
This was my first problem, we can call that "the stcrprep problem".
Here comes my second problem, which we can call "the internal validation problem".
How would you go on about assessing internal validation of this model? I've seen so many different recommendations. van Gerloven et al in Validation of prediction models in the presence of competing risks: a guide through modern methods (https://www.bmj.com/content/bmj/377/...69249.full.pdf) presents a calibration plot (Fig 1) drawn by using psedo-observations to compensate for censoring, and they provide R code on their Github page (https://github.com/survival-lumc/Val...readme-ov-file) but I have no idea how to do this in Stata (or in R for that matter). Pseudo-values are also used (and maybe first time introduced?) in Calibration plots for risk prediction models in the presence of competing risks by Gerds, Andersen and Kattan (https://pubmed.ncbi.nlm.nih.gov/24668611/). Unfortunately, this paper does not come with an Appendix with R or Stata code, and I cannot make use of the mathematical formulae presented in the paper. And then we have Graphical calibration curves and the integrated calibration index (ICI) for competing risk models by Austin, Putter, Giardiello and van Kleveren (https://diagnprognres.biomedcentral....12-021-00114-6). In the manuscript they refer to R code available in the appendix, but it is actually lacking in the appendix. So, understand my frustration! The same goes for discrimination, there doesn't seem to be a straight-forward way to assess that either (Gönen and Hellers C-statistic, Wolberts method, Somers D etc etc) and how to practically do it in Stata. And especially how to do it after first expanding the data using stcrprep!
I suppose since the Fine-and-Gray model calculates the CIF, the most intuitive way to assess calibration would be to plot the CIF obtained from the model (with covariates set at their mean, or maybe mode, or maybe for a typical 'low risk case' and a typical 'high risk case') and overlay the actual observed cumulative incidence. But that is not what I've seen recommended (see the papers above), and I don't know why. Seems like I'm missing something here.
Another way to assess calibration I've seen is to make a prediction at, say 3 years, and plot at calibration plot as one normally would for a logistic regression, that is, have predicted risk in bins of deciles on the x-axis and the mean observed risk for those cases on the y-axis. But to me it seems counterintuitive to first develop a model that calculates time-to-event, only to then assess the models performance at only one or a few timepoints. Then why not fit a logistic model in the first place and change the outcome to event prior to 3 years yes/no?
So, in summary:
I'm conducting a model development study, in which I'm trying to predict recurrence of endometrial cancer in the presence of death (from any cause) prior to recurrence as a competing risk. I had initially over 80 candidate predictors (the data is a follow-up from a previous study with another aim). I performed event-per-parameter calculations using pmsampsize, and based on that (and subject matter knowledge) I've selected a handful of candidate predictors.
Since I'm more interested in prediction, rather than etiology, I've opted for a Fine-and-Gray model (instead of cause-specific modelling)
"Lau et al suggest that the cause‐specific hazard model is more appropriate for addressing etiological questions, while the Fine‐Gray model is more appropriate for addressing questions around incidence and prognosis. Both we and Wolbers et al have echoed this assertion.4, 15 Thus, if the research objective is to derive a model for predicting the probability of the occurrence of outcomes over time, then a subdistribution hazard model would be appropriate.16, 17"
Since I have some continuous candidate predictors, I've also opted for a MFPA-algoritm to obtain fractional polynominals and simultaneously do backwards elimination of redundant candidate predictors.
"MFP is an approach to multivariable model-building which retains continuous predictors as continuous, finds non-linear functions if sufficiently supported by the data, and removes weakly influential predictors by backward elimination (BE)."
Now, here are my problem(s).
The MFPA-algorithm is very computationally intensive. In my full dataset (1429 obs.), one run takes about 7 minutes, which would be fine if it weren't for the need of later assessing internal validation. For internal validation, with bootstrap as an example, the entire model development process must be repeated in each bootstrap sample, and the number of bootstrap samples should be high (100 to 1000 possibly). So this part would take days! Even with fewer resampling datasets, say k-fold cross validation, it will take hours. Only a single random split would be computationally efficient, but that is not recommended due to low power (and as I described above, the number of candidate predictors were based on an EPP-calculation which assumed that the full dataset would be available for model development).
I've come across a work-around, stcrprep by Paul C Lambert https://pubmed.ncbi.nlm.nih.gov/30542252/ (PMID: 30542252). This command expands the dataset and assign weights, and is based on crprep in R.
Geskus (2011) has recently proposed an alternative way to estimate the cause-specific CIF that uses weighted versions of standard estimators. A corresponding R command, crprep, which is part of the mstate package, has been developed that restructures the data and calculates the appropriate weights (de Wreede et al. 2011). This paper describes a Stata command, stcrprep, that has similar functionality to crprep with some further extensions to allow parametric models for the cause-specific CIF to be fitted. After using stcrprep a number of standard Stata survival analysis commands can then be used for the analysis of competing risks. For example, sts graph, failure will give a plot of the cause-specific CIF and stcox will fit the Fine and Gray proportional subhazards model.. - P C Lambert 2017
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int idnum double(age bmi colorscore vascularpattern_dicho) float(emborder echogenicity tumor_size extrauterine date_of_surgery primary_end_point date_of_primary_event esmo_pre_op_sub) 262 52 23.5 3 0 2 2 67 1 19463 1 20578 1 1329 71 25.4 4 1 2 1 47 1 20234 1 20269 1 551 55 34.89 1 0 2 1 14 1 20424 0 22233 0 458 72 59.49 4 0 1 2 17 1 18975 1 19479 0 1384 59 37.1 1 0 1 2 9 0 19129 0 21978 0 1400 75 33.2 5 0 3 2 57 0 19863 0 21937 1 1397 73 38.44 2 0 2 2 14 0 19500 1 19739 1 1495 56 31.9 1 0 2 2 15 0 20030 0 21875 0 439 69 26.67 3 0 2 2 26 0 19344 0 21804 0 1789 64 37.2 4 1 2 2 31 0 19988 1 20769 1 111 64 30.4 2 0 2 2 8 0 19150 0 21783 0 1036 58 34.22 2 0 2 2 9 0 19170 0 19237 1 1591 58 30.11 4 0 2 2 26 0 19280 0 21893 1 1594 59 23.43 3 0 2 1 21 0 19277 0 21851 0 258 54 32.42 5 0 3 2 4 0 19084 0 21795 0 892 61 26.44 1 0 2 2 6 0 19186 0 21857 0 782 68 29.5 1 0 2 2 16 0 20112 0 21826 2 1395 69 17.84 3 1 2 1 15 0 19493 0 21792 1 875 61 40.3 3 1 2 2 17 1 20473 0 21850 1 980 54 22.4 4 1 2 2 27 0 20274 0 21727 1 1256 67 25.4 1 0 1 1 5 0 19801 0 21854 0 1192 62 34.8 4 1 2 2 34 0 20156 2 21847 1 1168 75 35.55 2 1 2 2 31 0 19429 0 21711 1 1688 69 38.75 2 0 2 1 7 0 18899 2 20458 0 1182 85 24.2 4 1 2 2 33 0 20151 2 21537 2 1480 69 35.1 1 0 2 1 14 0 20057 0 21792 0 853 56 27.18 2 0 2 2 14 0 19373 0 21801 0 652 76 32.1 5 0 3 2 26 0 19781 0 21874 2 1453 68 27.44 2 0 1 1 7 0 18947 0 21792 0 318 43 30.44 3 1 1 2 9 0 18860 0 21808 0 455 78 25.31 4 1 2 1 16 0 18987 0 21792 1 197 66 23.43 2 0 2 2 10 0 19464 0 21566 0 953 51 28.6 4 1 2 2 34 1 19546 0 21696 2 929 70 32.46 1 0 2 1 10 0 18814 0 21787 0 1687 72 27.99 1 0 2 1 11 0 18884 2 21517 0 930 59 34.94 1 0 1 2 7 0 18814 2 21321 0 1124 67 24.4 4 0 2 1 18 0 20241 0 21792 2 365 80 30.44 1 0 2 2 24 0 19574 0 21810 1 1717 68 29.1 4 1 2 1 31 0 20346 0 21894 1 400 57 30.1 4 1 2 2 49 1 20304 1 21725 0 475 67 25.7 4 1 2 2 23 0 20424 0 21833 1 1379 61 31.62 1 0 1 1 8 1 18765 0 21787 0 718 60 24.2 4 1 2 2 24 0 20136 0 21811 1 1236 60 21.93 1 0 1 1 5 0 19057 0 21792 0 227 59 18.3 3 0 2 1 11 0 20206 0 21867 1 749 83 31.25 1 0 1 1 5 0 19374 0 21792 0 1092 62 28.22 2 0 2 2 16 0 19148 0 21792 0 904 83 29.7 3 0 2 2 26 0 19934 0 21993 1 707 67 30.5 5 0 3 2 47 0 20135 1 20452 1 548 58 44.4 2 1 2 2 9 0 20436 0 21647 0 335 62 40.89 2 0 1 2 14 0 19574 0 21864 0 945 85 32.05 3 1 1 2 30 0 19560 0 21864 1 858 68 25.5 2 0 1 1 3 0 20109 1 20888 0 260 47 40.4 1 0 1 1 18 0 19456 0 21864 0 1492 68 27.8 4 1 1 2 27 0 19687 1 21410 2 910 52 25 4 1 2 2 22 0 19928 0 21307 2 1499 62 21.8 4 0 1 2 16 0 20414 1 20760 0 536 70 31.1 3 0 2 2 27 0 20074 0 21929 1 319 59 38.67 3 1 2 1 23 0 18855 1 20640 1 888 83 24.97 3 1 2 2 63 0 18839 1 19240 2 581 74 28.3 4 1 1 2 23 0 19035 0 21847 1 1111 73 26.4 4 1 2 1 32 0 19899 0 22233 2 465 73 24.5 4 1 2 2 65 0 19337 1 19446 1 343 65 29.2 1 0 2 1 5 0 19599 0 21850 0 1186 40 47.7 3 0 2 1 18 0 20163 0 21792 0 790 57 20.9 3 1 2 1 12 0 19753 0 21840 1 97 61 32.3 2 0 2 1 8 0 20227 0 21810 0 451 50 22.7 1 0 2 2 12 0 20061 2 20089 0 1566 38 35.3 4 0 2 1 8 0 20368 0 21871 0 559 41 25.5 4 1 2 1 52 1 20086 0 21792 2 247 67 25.2 1 0 1 2 6 0 20201 0 21767 0 1730 84 35.15 3 0 1 2 11 0 19619 2 21832 0 1356 68 30.75 4 0 1 2 10 0 19139 0 21621 0 673 66 23.9 2 0 2 1 9 0 19416 0 21792 2 1526 55 27.1 1 0 1 1 22 0 20057 0 22237 0 468 56 45.7 3 0 1 2 25 0 19704 0 21376 1 1289 86 30.1 1 0 1 1 20 0 20211 0 21868 0 250 67 21.6 4 1 2 1 29 0 20184 1 21025 2 917 70 21.9 4 1 2 2 24 0 20299 0 21937 2 1216 59 27.5 3 0 2 2 23 0 20165 0 21924 1 991 61 30.5 3 1 2 2 32 0 20458 0 21792 1 1608 71 31.63 4 0 2 2 22 0 19659 0 21864 1 1310 72 22.9 5 0 3 2 0 0 19855 0 22233 2 1457 68 31.31 3 0 2 1 24 0 18946 0 21792 0 623 62 36.6 1 0 2 2 21 1 19774 1 20647 1 748 73 34.72 4 1 2 1 34 0 19367 0 21792 1 514 78 39.25 3 1 2 2 24 0 19329 0 20832 1 564 53 22.8 3 0 2 1 19 0 20071 0 21711 0 1273 54 20.2 4 1 2 2 18 1 20219 1 21440 0 1455 53 32.83 2 0 1 1 18 0 18944 0 21792 0 824 69 26.61 3 1 2 2 20 0 19371 1 21188 1 709 55 44.8 2 0 2 1 17 0 20134 0 21792 0 1441 66 30.4 4 1 2 2 36 1 20422 1 20811 2 931 55 49.6 1 0 2 1 9 0 18828 0 21787 1 504 62 25.39 3 0 2 2 17 0 19340 0 21843 2 571 71 40.4 3 1 2 2 25 1 20485 1 21332 1 1790 76 28.4 1 0 2 1 1 0 19976 0 21804 0 868 57 23.3 3 1 2 1 17 0 20103 0 21792 0 1475 62 31.2 3 0 2 2 18 0 19668 0 21593 1 290 45 27.6 3 1 1 2 22 0 20205 0 21138 0 495 77 22.65 2 0 2 1 8 0 18979 2 19328 2 1744 51 26.6 1 0 1 1 3 0 20332 0 21850 0 1603 72 31.64 4 1 1 2 18 0 19296 0 22439 0 347 75 34 1 0 1 2 10 0 19961 0 21809 0 1306 62 29.13 3 0 2 2 25 1 19485 0 21811 1 528 83 25 2 0 2 1 16 0 20067 2 21368 0 679 61 45.26 2 0 2 2 27 0 19413 2 21220 0 1784 60 29.3 3 1 1 2 11 0 19990 0 21864 0 754 77 31.95 1 0 2 2 16 0 19373 0 21792 2 59 84 22.3 3 1 2 1 8 0 20144 2 21705 1 887 45 35.1 4 1 2 2 66 0 20288 0 21809 1 382 64 29.4 3 1 2 2 24 1 19960 0 22233 1 1658 61 43 1 0 2 1 28 0 19997 0 21792 0 521 66 30.5 3 1 2 2 34 0 20060 0 21854 2 1674 74 32.77 5 0 3 2 0 0 19245 1 19711 0 529 64 40.6 2 0 2 1 9 0 20074 0 21792 0 963 60 32.9 2 0 2 1 7 0 19912 1 20370 2 1240 40 34.2 2 0 2 2 15 0 19444 0 21864 0 170 74 24.2 3 0 2 2 16 0 20187 0 21448 0 1617 72 29.7 3 1 2 2 21 0 19633 0 21115 1 992 70 40.9 2 0 1 2 16 0 20260 0 21852 1 632 62 27.6 3 0 1 1 9 0 20144 0 21851 0 794 62 33.6 3 1 1 1 18 0 20100 0 21822 0 1106 70 27.1 3 1 2 2 21 0 19875 0 21792 0 1350 65 31.5 2 1 2 2 25 0 20239 0 22233 1 1229 57 34.66 4 0 2 2 23 1 19059 0 21850 1 672 70 32.87 3 0 1 2 26 0 19413 0 21792 1 440 65 26.66 1 0 2 2 4 0 19353 0 21801 0 810 59 29.29 3 1 2 2 19 0 18646 0 21850 0 1019 60 33.2 3 1 2 1 20 0 20248 0 21858 0 115 76 23.23 3 0 2 2 8 0 19508 0 21864 0 986 78 24.2 3 1 2 2 0 0 20276 2 20912 1 1408 71 33.7 3 0 2 2 19 0 20235 0 21946 0 1610 62 22.6 3 0 2 2 20 1 19652 0 21431 1 955 59 31.24 2 1 1 2 31 0 19569 0 21844 0 358 71 29.38 1 0 2 2 15 0 18853 1 19730 0 976 73 24.2 2 0 2 1 8 0 20282 0 21889 2 1693 56 43.43 1 0 1 2 7 0 19241 0 22000 0 682 76 43 1 0 2 1 10 0 19417 2 21262 0 206 65 41.5 3 0 1 2 39 0 19835 0 21978 0 619 74 30.38 3 1 2 2 26 0 19416 0 21850 1 1171 50 21.6 3 0 1 2 9 0 19808 0 21792 0 582 62 44.44 2 0 2 2 42 0 19403 2 20747 2 end format %tdMon_DD,_CCYY date_of_surgery format %tdMon_DD,_CCYY date_of_primary_event label values vascularpattern_dicho vasc label def vasc 0 "All others", modify label def vasc 1 "Multiple, multifocal", modify label values emborder emborder label def emborder 1 "Regular", modify label def emborder 2 "Non-regular", modify label def emborder 3 "Endometrium not visible", modify label values echogenicity echo label def echo 1 "Uniform", modify label def echo 2 "Non-uniform", modify label values extrauterine yn label def yn 0 "No", modify label def yn 1 "Yes", modify label values primary_end_point prim label def prim 0 "No event", modify label def prim 1 "Primary event", modify label def prim 2 "Competing event", modify label values esmo_pre_op_sub preoplabel label def preoplabel 0 "Low risk", modify label def preoplabel 1 "Intermediate risk", modify label def preoplabel 2 "High risk", modify
Code:
*St-set the data stset date_of_primary_event, id(idnum) failure(primary_end_point == 1) origin(date_of_surgery) scale(365.24) findit mfpa //st0425, unless you already have it installed. *Run MFPA-algorithm, fit a Fine-and-Gray model. Force esmo_pre_op_sub into the model. mfpa, cycles(3) dfdefault(3) select(0.157, i.esmo_pre_op_sub:1) alpha(0.25) : /// stcrreg i.esmo_pre_op_sub c.age i.colorscore i.extrauterine /// i.vascularpattern_dicho i.emborder i.echogenicity c.bmi c.tumor_size /// , compete(primary_end_point==2) ///The MFPA-algorithm is very slow, for the full dataset each run takes about 7 min. *Draw the CIF, by esmo_pre_op_sub group stcurve, cif at1(esmo_pre_op_sub=0) at2(esmo_pre_op_sub=1) at3(esmo_pre_op_sub=2)
and then with stcrprep and stcox:
Code:
* First, data must be st-set to later be rearranged, expanded and assigned weights. stset date_of_primary_event, id(idnum) failure(primary_end_point == 1, 2) origin(date_of_surgery) scale(365.24) //Both failcodes has to be included in the failure() option findit stcrprep //st0471, unless you already have it installed stcrprep, events(primary_end_point) keep(esmo_pre_op_sub extrauterine age colorscore vascularpattern_dicho emborder echogenicity bmi tumor_size) trans(1 2) //Here I'm omitting the byg() option, since I want the same output as for stcrreg. See Lambert for details; "The original data is now reloaded and stcrprep is run again, but this time the censoring distribution used to generate the weights is calculated for the cohort as a whole, i.e. the byg() option is not used. This is because we want to fit the same model as that fitted by stcrreg, which does not calculate weights separately by subgroup." gen event = primary_end_point == failcode stset tstop [pw=weight_c], failure(event) enter(tstart) noshow //"If one uses pweights rather than iweights, along with using stset and the vce(cluster patid) option for the stcox command, then the standard errors are the same as stcrreg to four decimal place". quote from Lambert. mfpa, cycles(3) dfdefault(3) select(0.157, i.esmo_pre_op_sub:1) alpha(0.25) : /// stcox i.esmo_pre_op_sub c.age i.colorscore i.extrauterine /// i.vascularpattern_dicho i.emborder i.echogenicity c.bmi c.tumor_size if failcode==1 /// , vce(cluster idnum) //After using stcrprep, stcox will do the same as stcrreg did before transforming the data. stcox, hr sts graph if failcode==1, by(esmo_pre_op_sub) failure /// ytitle("Cumulative incidence") /// xtitle("Years since surgery") /// ylabel(0(0.1)0.5, angle(h) format(%3.1f)) /// legend(order(1 "Low Risk" 2 "Medium Risk" 3 "High Risk") /// cols(1) ring(0) pos(5)) /// scheme(sj) name(cif_relapse, replace)
We see that the SHRs, z-scores, p-values and all of the output is identical. So the stcrprep-->stcox seems to be working. But the graphs don't look the same.
I e-mailed Paul C Lambert and he replied :
Your first do file uses stscurve, so is not the same thing as using sts graph in the 2nd do file, you are comparing model based covariate specific estimate with the non-parametric Aalen Johansson estimate. If you compared the sts graph estimates with stcompet you should get the same estimates.
This was my first problem, we can call that "the stcrprep problem".
Here comes my second problem, which we can call "the internal validation problem".
How would you go on about assessing internal validation of this model? I've seen so many different recommendations. van Gerloven et al in Validation of prediction models in the presence of competing risks: a guide through modern methods (https://www.bmj.com/content/bmj/377/...69249.full.pdf) presents a calibration plot (Fig 1) drawn by using psedo-observations to compensate for censoring, and they provide R code on their Github page (https://github.com/survival-lumc/Val...readme-ov-file) but I have no idea how to do this in Stata (or in R for that matter). Pseudo-values are also used (and maybe first time introduced?) in Calibration plots for risk prediction models in the presence of competing risks by Gerds, Andersen and Kattan (https://pubmed.ncbi.nlm.nih.gov/24668611/). Unfortunately, this paper does not come with an Appendix with R or Stata code, and I cannot make use of the mathematical formulae presented in the paper. And then we have Graphical calibration curves and the integrated calibration index (ICI) for competing risk models by Austin, Putter, Giardiello and van Kleveren (https://diagnprognres.biomedcentral....12-021-00114-6). In the manuscript they refer to R code available in the appendix, but it is actually lacking in the appendix. So, understand my frustration! The same goes for discrimination, there doesn't seem to be a straight-forward way to assess that either (Gönen and Hellers C-statistic, Wolberts method, Somers D etc etc) and how to practically do it in Stata. And especially how to do it after first expanding the data using stcrprep!
I suppose since the Fine-and-Gray model calculates the CIF, the most intuitive way to assess calibration would be to plot the CIF obtained from the model (with covariates set at their mean, or maybe mode, or maybe for a typical 'low risk case' and a typical 'high risk case') and overlay the actual observed cumulative incidence. But that is not what I've seen recommended (see the papers above), and I don't know why. Seems like I'm missing something here.
Another way to assess calibration I've seen is to make a prediction at, say 3 years, and plot at calibration plot as one normally would for a logistic regression, that is, have predicted risk in bins of deciles on the x-axis and the mean observed risk for those cases on the y-axis. But to me it seems counterintuitive to first develop a model that calculates time-to-event, only to then assess the models performance at only one or a few timepoints. Then why not fit a logistic model in the first place and change the outcome to event prior to 3 years yes/no?
So, in summary:
- Why am I getting different CIFs when doing stcrreg-->stcurve, cif compared to stcrprep-->stcox-->sts graph, failure? Or rather, which is the more appropriate way to present the CIF after fitting a Fine-and-Gray model?
- How would I go on about assessing internal validation with calibration and discrimination after fitting a Fine-and-Gray model after expanding the data using stcrprep?
Comment