Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate a bias-corrected C-statistic after bootstrapping? And how to calculate this on imputed data?

    Dear statalist users,

    I am trying to write my own bootstrap program that calculates the optimism in my (logistic regression) model, as internal validation.
    This is what I have now:


    capture program drop optimism
    program define optimism, rclass
    preserve
    bsample
    logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi i.adjuvant i.neo_beh i.target
    lroc, nograph
    return scalar area1 = r(area)
    local a1 = r(area)
    predict p
    roctab surv10 p /*calculate a ROC on the full data using model derived on bootstrap sample */
    return scalar area2 = r(area)
    local a2 = r(area)
    return scalar dif = `a1' - `a2'
    drop p
    end

    simulate area1=r(area1) area2=r(area2) dif=r(dif), reps(200) seed(12345): optimism
    sum dif
    summ area1
    sum area2


    The latter commands give a difference between the original model and the bootstrapped models of almost 0. The AUCs are equal. I am not sure if I can believe the model performs that good. Does anyone know if I am doing the right thing? I would like to calculate an AUC of the original prediction model, and compare this with the mean AUC of all bootstrap samples.


    Is there any other way to calculate a bias-corrected C-statistic? I would like to report a summary measure that shows how good my model performs in an internal validation.


    My second question is: how can I perform exactly the same thing on imputed datasets?

    Thanks a lot in advance.

    Marissa van Maaren


  • #2
    I hope there is someone who can help me. Any brainstorm on this topic is appreciated!

    Comment


    • #3
      The latter commands give a difference between the original model and the bootstrapped models of almost 0.
      I don't think so. Your -optimism- program calculates the same thing twice. First it does -lroc- on the results of the logistic regression, and then it uses -roctab- on the output of -predict-. Those are just two different ways of calculating the same thing. If there is any difference at all it would be attributable to rounding errors in -predict-!

      If what you want to do is compare the results of the -bootstrap- ROC areas with the original one, you need to run the model before you -bsample- and get that ROC area, and then re-run the model after -bsample-, get that ROC area and then compare the results.

      Actually, the pre-bootstrap model will always be the same thing, so there is no point in putting that inside the program. I would probably do something like this:

      Code:
      capture program drop optimism
      program define optimism, rclass
          preserve
          bsample
          logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi ///
              i.adjuvant i.neo_beh i.target
          lroc, nograph
          return scalar area_bootstrap = r(area)
      end
      
      logit surv10 i.agegr2 i.subloc i.tdiffgr i.HR i.okh_type i.okd i.radi ///
          i.adjuvant i.neo_beh i.target
      lroc, nograph
      local base_ROC = r(area)
      tempfile sim_results
      simulate area = r(area_bootstrap), reps(200) seed(12345) saving(`sim_results'): optimism
      
      use `sim_results', clear
      gen diff = area - `base_ROC'
      summ diff
      Note: No sample data provided, so code is not tested. Beware of typos or other errors. The code gives the gist of it.

      In the future, when posting code, please place it between code delimiters (see FAQ #12 for instructions) so that indentation is preserved and readability is enhanced.

      Comment


      • #4
        Hi Marissa

        Regarding your question about using MI with the bootstrap, you might find this paper by Shomaker and Heumann useful. It outlines the two broad approaches you might take: impute-then-bootstrap or bootstrap-then-impute. You can get Stata do either without a huge effort.

        However, they are interested in confidence intervals based on non-parametric bootstrap, and I'm not aware of any work that has looked at how well combining MI and bootstrap works for your kind of bias-correction problem.

        Tim

        Comment


        • #5
          Tim Morris: Thank you for this paper, it's a good start. Will definitely help me getting in to it

          Clyde Schechter: Thanks for the reminder, I will use the code delimiters next time! About your code: had to change some things but it works now. I now understand that my code did the same thing twice. After using your suggestion I get a difference in AUC of only 0.0018. It sounds really unlikely to me that the model fits that well. Do you (or anyone else) have any idea whether this is even possible? In literature I never find such small difference.

          Comment


          • #6
            Prof. Schechter: Would you please tell us how to do this analyses in 10-times multiple imputed dataset? Many thanks!

            Comment

            Working...
            X