Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bootstrap cinfidence interval for differences in median between two groups

    Hello I want to get 95% bootstrap confidence interval for differences in median between two groups, using 'rank sum' (mann-whitney U test). The code doesn't work and not sure how to recall the stored median values for each group (i.e. median1 and median 2) after running tabstat command. Any help is welcome.

    Code:
    ranksum cost_5year_incQ6_new_adj, by(CA_binary)
    
    // Calculate group medians
    tabstat cost_5year_incQ6_new_adj, by(CA_binary) statistics(median)
    
    bootstrap (median1 - median2), reps(1000) seed(12345): ///
        tabstat cost_5year_incQ6_new_adj, by(CA_binary) statistics(median)
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(cost_5year_incQ6_new_adj CA_binary)
      3924.18 0
     162.6696 0
     4928.931 0
    12978.354 1
      984.852 0
       7820.9 0
      905.237 0
     4101.727 0
     10151.54 0
    3420.6704 0
      147.312 1
     676.0845 0
      96.2562 0
     465.0858 0
     45124.98 0
     354.5962 0
     761.8788 0
     196.5998 0
      5965.98 0
            0 0
     16783.17 0
     113.6655 0
     131.7492 0
      2626.91 0
     2039.158 0
            0 0
    4187.7344 0
     9091.484 0
      344.324 0
    2027.5046 0
     5022.307 0
     65420.32 0
     135.3106 0
            0 0
       303.45 0
     7685.859 0
     12454.23 1
      624.994 0
     122.8616 0
     1212.978 1
     499.7064 0
     45853.01 0
       24.948 0
      22.1598 0
      660.852 0
            0 0
      2538.18 0
    541.66724 0
     14546.33 0
      411.312 0
    end
    Thank you! BW Kim

  • #2
    This requires a custom program.
    Code:
    cap program drop meddiff
    program define meddiff, rclass
        sum cost_5year_incQ6_new_adj if CA_binary == 0, det
        local med1 = r(p50)
        sum cost_5year_incQ6_new_adj if CA_binary == 1, det
        local med2 = r(p50)
        return scalar meddiff = `med1' - `med2'
    end
    
    
    bootstrap r(meddiff), seed(123): meddiff
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Your command isn't working because -tabstat- does not return anything for -bootstrap- to work with: it just sends its results to the screen.

      So you need to calculate the mediasn with a command that returns something, and you also need to calculate the difference between them in a program that returns that and then use that under -bootstrap:-. Also, although I do not know exactly what you are trying to calculate and how you will interpret the results, but in situations like yours, people often want the bootstrap sampling to be stratified by the grouping variable (CA_binary).

      Code:
      capture program drop median_diff
      program define median_diff, rclass sortpreserve
          syntax varname(numeric), by(varname)
          levelsof `by', local(by_values)
          capture assert `:word count `by_values'' == 2
          if c(rc) != 0 {
              display as error "Variable `by' must take on exactly 2 values."
              exit 9
          }
          forvalues i = 1/2 {
              centile `varlist' if `by' == `:word `i' of `by_values''
              local median`i' =  r(c_1)
          }
          return scalar median_diff = `median2' - `median1'
          exit
      end
          
          
      bootstrap r(median_diff), strata(CA_binary) reps(1000) seed(12345): ///
          median_diff cost_5year_incQ6_new_adj, by(CA_binary)
      If you don't want stratified bootstrap sampling, remove the -strata(CA_binary)- option.

      Added: Crossed with #2. The solution there is specific to the variables mentioned in your post. The code here is more general, and you can use it with any numeric variable in place of cost_5year_incQ6_new_adj, and any dichotomous grouping variable in place of CA_binary. In principle, however, the two solutions are the same. Well, almost: #2 does not call for stratified bootstrap sampling.
      Last edited by Clyde Schechter; 28 Feb 2025, 09:45.

      Comment


      • #4
        Thanks to Clyde for providing this excellent and highly reusable program. One additional difference: my version used summarize while the second solution uses centile to compute the median. Both work the same but I have the impression that summarize is faster for this application. However, unless you have very large datasets or need many many resamples, this should not make a huge difference.
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Dear Clyde and Felix,

          Thank you very much indeed for your help. It is great! For Clyde's question, I need to estimate whether costs of treatment group is significantly different from the costs of control group. Due to the skewed distribution of costs, I was considering using median difference rather than mean difference. Thank you again! BW Kim

          Comment


          • #6
            For this aim, a quantile regression also works:

            Code:
            qreg depvar i.group
            Best wishes

            (Stata 16.1 MP)

            Comment

            Working...
            X