Dear statalist users,
I am trying to write my own bootstrap program that calculates the optimism in my (logistic regression) model, as internal validation.
This is what I have now:
capture program drop optimism
program define optimism, rclass
preserve
bsample
logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi i.adjuvant i.neo_beh i.target
lroc, nograph
return scalar area1 = r(area)
local a1 = r(area)
predict p
roctab surv10 p /*calculate a ROC on the full data using model derived on bootstrap sample */
return scalar area2 = r(area)
local a2 = r(area)
return scalar dif = `a1' - `a2'
drop p
end
simulate area1=r(area1) area2=r(area2) dif=r(dif), reps(200) seed(12345): optimism
sum dif
summ area1
sum area2
The latter commands give a difference between the original model and the bootstrapped models of almost 0. The AUCs are equal. I am not sure if I can believe the model performs that good. Does anyone know if I am doing the right thing? I would like to calculate an AUC of the original prediction model, and compare this with the mean AUC of all bootstrap samples.
Is there any other way to calculate a bias-corrected C-statistic? I would like to report a summary measure that shows how good my model performs in an internal validation.
My second question is: how can I perform exactly the same thing on imputed datasets?
Thanks a lot in advance.
Marissa van Maaren
I am trying to write my own bootstrap program that calculates the optimism in my (logistic regression) model, as internal validation.
This is what I have now:
capture program drop optimism
program define optimism, rclass
preserve
bsample
logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi i.adjuvant i.neo_beh i.target
lroc, nograph
return scalar area1 = r(area)
local a1 = r(area)
predict p
roctab surv10 p /*calculate a ROC on the full data using model derived on bootstrap sample */
return scalar area2 = r(area)
local a2 = r(area)
return scalar dif = `a1' - `a2'
drop p
end
simulate area1=r(area1) area2=r(area2) dif=r(dif), reps(200) seed(12345): optimism
sum dif
summ area1
sum area2
The latter commands give a difference between the original model and the bootstrapped models of almost 0. The AUCs are equal. I am not sure if I can believe the model performs that good. Does anyone know if I am doing the right thing? I would like to calculate an AUC of the original prediction model, and compare this with the mean AUC of all bootstrap samples.
Is there any other way to calculate a bias-corrected C-statistic? I would like to report a summary measure that shows how good my model performs in an internal validation.
My second question is: how can I perform exactly the same thing on imputed datasets?
Thanks a lot in advance.
Marissa van Maaren
Comment