How to calculate a bias-corrected C-statistic after bootstrapping? And how to calculate this on imputed data?

Marissa van Maaren

Join Date: Oct 2015

Posts: 24
#1

How to calculate a bias-corrected C-statistic after bootstrapping? And how to calculate this on imputed data?

24 May 2017, 04:08

Dear statalist users,

I am trying to write my own bootstrap program that calculates the optimism in my (logistic regression) model, as internal validation.
This is what I have now:

capture program drop optimism
program define optimism, rclass
preserve
bsample
logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi i.adjuvant i.neo_beh i.target
lroc, nograph
return scalar area1 = r(area)
local a1 = r(area)
predict p
roctab surv10 p /*calculate a ROC on the full data using model derived on bootstrap sample */
return scalar area2 = r(area)
local a2 = r(area)
return scalar dif = `a1' - `a2'
drop p
end

simulate area1=r(area1) area2=r(area2) dif=r(dif), reps(200) seed(12345): optimism
sum dif
summ area1
sum area2

The latter commands give a difference between the original model and the bootstrapped models of almost 0. The AUCs are equal. I am not sure if I can believe the model performs that good. Does anyone know if I am doing the right thing? I would like to calculate an AUC of the original prediction model, and compare this with the mean AUC of all bootstrap samples.

Is there any other way to calculate a bias-corrected C-statistic? I would like to report a summary measure that shows how good my model performs in an internal validation.

My second question is: how can I perform exactly the same thing on imputed datasets?

Thanks a lot in advance.

Marissa van Maaren
Tags: None
Marissa van Maaren

Join Date: Oct 2015

Posts: 24
#2

29 May 2017, 04:59

I hope there is someone who can help me. Any brainstorm on this topic is appreciated!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

29 May 2017, 10:19

The latter commands give a difference between the original model and the bootstrapped models of almost 0.

I don't think so. Your -optimism- program calculates the same thing twice. First it does -lroc- on the results of the logistic regression, and then it uses -roctab- on the output of -predict-. Those are just two different ways of calculating the same thing. If there is any difference at all it would be attributable to rounding errors in -predict-!

If what you want to do is compare the results of the -bootstrap- ROC areas with the original one, you need to run the model before you -bsample- and get that ROC area, and then re-run the model after -bsample-, get that ROC area and then compare the results.

Actually, the pre-bootstrap model will always be the same thing, so there is no point in putting that inside the program. I would probably do something like this:

Code:

capture program drop optimism program define optimism, rclass preserve bsample logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi /// i.adjuvant i.neo_beh i.target lroc, nograph return scalar area_bootstrap = r(area) end logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi /// i.adjuvant i.neo_beh i.target lroc, nograph local base_ROC = r(area) tempfile sim_results simulate area = r(area_bootstrap), reps(200) seed(12345) saving(`sim_results'): optimism use `sim_results', clear gen diff = area - `base_ROC' summ diff

Note: No sample data provided, so code is not tested. Beware of typos or other errors. The code gives the gist of it.

In the future, when posting code, please place it between code delimiters (see FAQ #12 for instructions) so that indentation is preserved and readability is enhanced.
Comment
Tim Morris

Join Date: Apr 2014

Posts: 92
#4

30 May 2017, 04:08

Hi Marissa

Regarding your question about using MI with the bootstrap, you might find this paper by Shomaker and Heumann useful. It outlines the two broad approaches you might take: impute-then-bootstrap or bootstrap-then-impute. You can get Stata do either without a huge effort.

However, they are interested in confidence intervals based on non-parametric bootstrap, and I'm not aware of any work that has looked at how well combining MI and bootstrap works for your kind of bias-correction problem.

Tim
Comment
Marissa van Maaren

Join Date: Oct 2015

Posts: 24
#5

30 May 2017, 04:36

Tim Morris: Thank you for this paper, it's a good start. Will definitely help me getting in to it

Clyde Schechter: Thanks for the reminder, I will use the code delimiters next time! About your code: had to change some things but it works now. I now understand that my code did the same thing twice. After using your suggestion I get a difference in AUC of only 0.0018. It sounds really unlikely to me that the model fits that well. Do you (or anyone else) have any idea whether this is even possible? In literature I never find such small difference.
Comment
Jing Pan

Join Date: Dec 2018

Posts: 9
#6

01 May 2019, 08:11

Prof. Schechter: Would you please tell us how to do this analyses in 10-times multiple imputed dataset? Many thanks!
Comment

Announcement

How to calculate a bias-corrected C-statistic after bootstrapping? And how to calculate this on imputed data?

Comment

Comment

Comment

Comment

Comment