Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Power calculation (comparing AUC-ROC)

    Dear Statalisters,
    I am wondering if if is possible in Stata 18 (and how?) to calculate the power for the expected difference in the areas under the curve was calculated from a fixed number of patients and considered a type I error fixed at 0.05.

    Thanks in advance



  • #2
    your question is unclear - are you looking for the power of a test to compare 2 AUC's or the power of a test to estimate an AUC with a given CI width or are you interested in the power to obtain a certain AUC value, or ...?

    Comment


    • #3
      Dear Rich Goldstein ,

      Thanks for your reply.
      I am looking for the power of a test to compare 2 (or more) AUC's.

      Comment


      • #4
        if the 2 (or more) AUC's are from nested models, then the p-value for additional predictor/covariate is what you want and that power depends on exactly what model you are estimating (e.g., logistic regression); if the different AUC's are from non-nested models, then I would do a simulation but someone else may know of something more direct

        added: I have previously supplied citations for what I say in the first part of my answer; there are articles by, IIRC, Begg and by Pepe that prove this and you can probably find those by a search of the archive (I am out of my office and thus can't get the citations myself)

        Comment


        • #5
          It is indeed from non-nested models, do you have any guide for the simulation?

          Comment


          • #6
            Originally posted by Carolina Hincapie View Post
            . . . do you have any guide for the simulation?
            You can do something along the following lines.
            Code:
            version 18.0
            
            clear *
            
            // seedem
            set seed 1452686763
                
            program define simEm, rclass
                version 18.0
                syntax , [n(integer 100)]
            
                drop _all
                drawnorm y x1 x2, double corr(1 0.65 0.45 \ 0.65 1 0.45 \ 0.45 0.45 1) n(`n')
                replace y = y > 0
            
                tempname auc1 auc2
                roctab y x1
                scalar define `auc1' = r(area)
                roctab y x2
                scalar define `auc2' = r(area)
                roccomp y x1 x2
                return scalar p = r(p)
                forvalues i = 1/2 {
                    return scalar auc`i' = `auc`i''
                }
            end
            
            forvalues n = 200(100)400 {
                quietly simulate p = r(p) auc1 = r(auc1) auc2 = r(auc2), reps(400): simEm , n(`n')
                generate byte pos = p < 0.05 if !mi(p)
                display in smcl as text _newline(1) "N = `n'"
                summarize auc1, meanonly
                display in smcl as text "AUC 1: " as result %04.2f r(mean) " (target 0.8)"
                summarize auc2, meanonly
                display in smcl as text "AUC 2: " as result %04.2f r(mean) " (target 0.7)"
                summarize pos, meanonly
                display in smcl as text "Power: " as result %04.2f r(mean)
            }
            
            exit
            I've chosen ROC AUCs of 0.7 versus 0.8 in a within-patient comparison. Output looks like:

            N = 200
            AUC 1: 0.80 (target 0.8)
            AUC 2: 0.70 (target 0.7)
            Power: 0.66

            N = 300
            AUC 1: 0.80 (target 0.8)
            AUC 2: 0.71 (target 0.7)
            Power: 0.82

            N = 400
            AUC 1: 0.80 (target 0.8)
            AUC 2: 0.71 (target 0.7)
            Power: 0.94

            You can tweak the correlation coefficients (in red above) fed to drawnorm in order to chose an ROC-AUC pair that's more to your suiting.

            But before all that, you might want to check out one of the literature methods cited in this thread, which you seem to be already aware of. (I'm not familiar with the user-written command mentioned there.)

            Comment


            • #7
              Dear Joseph Coveney,
              Thanks a lot, it worked!

              I need also to do something extra that you might know how to deal with it.
              In the previous step it was a comparison of different models in the same cohort (same N), now I also need to compare same model in different cohorts (the N will change depending on the sub-cohorts, based on the region of the individuals, so I will have a different N for every AUC-ROC).

              e.g:

              North
              Obs: 510
              ROC area: 0.70

              South
              Obs: 250
              ROC area: 0.80


              I am not sure if I can do the same as the previous step taking the smallest N value (N=250) or if I should do it in a different way.


              Thanks in advance.

              Comment


              • #8
                Originally posted by Carolina Hincapie View Post
                . . . now I also need to compare same model in different cohorts (the N will change depending on the sub-cohorts, based on the region of the individuals, so I will have a different N for every AUC-ROC).
                Inasmuch as you have determined sample sizes, you seem not to be asking for how to conduct a power analysis but rather for a method to compare two independent ROC-AUCs. The easiest path for you is probably some kind of online calculator; maybe take a look at this one?

                Comment


                • #9
                  Dear Joseph Coveney,
                  Thanks for your kind reply
                  I already made that with the roccomp command.

                  And thanks a lot for your post #6, it helped a lot!

                  Comment


                  • #10
                    Originally posted by Carolina Hincapie View Post
                    I already made that with the roccomp command.
                    How did you manage that? You have independent samples with different sample sizes and almost certainly unequal prevalence of the disease.

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post
                      How did you manage that? .
                      Code:
                      roccomp outcome model1, by(region) graph  summary
                      Code:
                      roccomp outcome model1 model2 model3, graph  summary
                      Following the manual:

                      Equality of AUCs for rating v1 of true state true between samples defined by catvar
                      Code:
                      roccomp true v1, by(catvar)
                      Equality of AUCs for ratings v1 and v2 for the same sample
                      Code:
                      rroccomp true v1 v2

                      But now I see also in the manual:
                      roccomp reports summary statistics and provides a test for the equality of the area under the curves, using an algorithm suggested by DeLong, DeLong, and Clarke-Pearson (1988).
                      ,
                      Which is for paired data, and I think I should use Hanley and McNeil (1982, 1983) or Vennkatraman (2000) for un-paired data. (I can't find this on Stata).








                      Comment


                      • #12
                        Originally posted by Carolina Hincapie View Post
                        But now I see also in the manual . . . Which is for paired data, and I think I should use Hanley and McNeil (1982, 1983) or Vennkatraman (2000) for un-paired data. (I can't find this on Stata).
                        I wasn't aware of (overlooked) roccomp's syntax for independent samples, thanks.

                        The online calculator whose URL I gave above in #8 does cite Hanley and McNeil (1982) for its method, but it gives roughly similar results as roccomp , by() at least in the example below.
                        Code:
                        version 18.0
                        
                        clear *
                        
                        // seedem
                        set seed 573624329
                        
                        quietly drawnorm y x, double corr(1 0.45 \ 0.45 1) n(750)
                        generate byte loc = _n > 250
                        quietly replace y = y > cond(loc, invnorm(0.3), invnorm(0.7))
                        
                        roccomp y x, by(loc)
                        
                        // For copy-paste to webform
                        tabulate y loc
                        
                        quietly roctab y x if !loc
                        return list
                        quietly roctab y x if loc
                        return list
                        
                        exit
                        roccomp gives P = 0.275 for this dataset, and the online calculator gives P = 0.278 (cropped screenshot pasted below).

                        Click image for larger version

Name:	vassarstats.png
Views:	1
Size:	22.8 KB
ID:	1744176

                        Comment

                        Working...
                        X