Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical significance between two MEDIAN values

    Hi Stata Community,

    Very often I work with variables that have a fixed value e.g. 1 2 3 4 5 (and not 1.1, or 2.2, or 4.6 etc.,) such as Glasgow Coman Scale, Visual Analogue of Pain score, SOFA score.

    The best measure of central tendency for such variables is MEDIAN. However, I am unaware of any statistical test that can be applied in Stata that tells me if the difference between the median between two groups of people from the same population is statistically significant.


    For example

    If the median VAS score with and without a painkiller is 5 and 8, how do I assess if the difference in the median value is statistically significant?


    Best Regards
    Pavan

  • #2
    Pavan:
    quantile regression at the median might be an option:
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . qreg price i.foreign
    Iteration  1:  WLS sum of weighted deviations =  74892.779
    
    Iteration  1: sum of abs. weighted deviations =    75241.5
    note: alternate solutions exist.
    Iteration  2: sum of abs. weighted deviations =    70307.5
    note: alternate solutions exist.
    Iteration  3: sum of abs. weighted deviations =    69547.5
    
    Median regression                                   Number of obs =         74
      Raw sum of deviations  71102.5 (about 4934)
      Min sum of deviations  69547.5                    Pseudo R2     =     0.0219
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         foreign |
        Foreign  |        983   597.6291     1.64   0.104    -208.3519    2174.352
           _cons |       4816   325.8571    14.78   0.000     4166.416    5465.584
    ------------------------------------------------------------------------------
    
    .
    Code:
    . qreg rep78 i.foreign
    Iteration  1:  WLS sum of weighted deviations =  18.845463
    
    Iteration  1: sum of abs. weighted deviations =       18.5
    note: alternate solutions exist.
    Iteration  2: sum of abs. weighted deviations =       18.5
    Iteration  3: sum of abs. weighted deviations =       18.5
    
    Median regression                                   Number of obs =         69
      Raw sum of deviations       26 (about 3)
      Min sum of deviations     18.5                    Pseudo R2     =     0.2885
    
    ------------------------------------------------------------------------------
           rep78 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         foreign |
        Foreign  |          1   .0840398    11.90   0.000     .8322559    1.167744
           _cons |          3   .0463628    64.71   0.000     2.907459    3.092541
    ------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The best measure of central tendency for such variables is MEDIAN

      Saying that commits you to saying that the best summary of 1 1 2 2 2 and of 2 2 2 3 5 is the same, namely 2.

      Hands up if you think that's discarding information that should not be discarded. Caution is always advisable but I would not object to midmeans of 1.67 or 2.33 or even means of 1.6 or 2.8 as two other summaries here.

      A classic test case is grade-point averages, which are based on taking means of ordered grades, despite the many books and papers that tell you not to do that.
      Last edited by Nick Cox; 28 Jun 2022, 04:44.

      Comment


      • #4
        Dear Pavan,

        Possibly, this paper offers you good advice to compare and examine median values between group categories:
        Conroy, R. M. (2012). What Hypotheses do “Nonparametric” Two-Group Tests Actually Test? The Stata Journal, 12(2), 182–190.

        Best,
        Eric
        http://publicationslist.org/eric.melse

        Comment


        • #5
          For the mechanics of how to test the difference in medians across two groups, go with Carlo's helpful solution. I do wonder about the statistical properties of the resulting asymptotic t statistic. Typically the underlying random variable is continuous, or at least continuous in a neighborhood of the median. I'm not sure what is the latest with discrete outcomes.

          Comment


          • #6
            medians are difficult and the "definitive" solution is the bootstrap; I am out of town and can't give full references but Efron and Tibshirani's book on the bootstrap has quite a bit on testing medians; other than that, the solution above from Carlo is good and I agree with Jeff's point also

            Comment


            • #7
              Thanks, Jeff!
              You really made my day!
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Joao Santos Silva : Do you have any suggestions? (see https://www.jstor.org/stable/27590667)

                Comment


                • #9
                  My choice here would be Somers' D (-ssc describe somersd-). It's not a measure of median differences, but I'd presume it to be closely related to that. In the current context it would measure, more or less, the probability that a randomly chosen member of group 1 has a higher (lower) score than a member of group 2. As Roger Newson has explained in his articles cited in the documentation for -somersd-, this statistic has some interesting relations to various "nonparametric" statistics. For better or worse, it's a genuinely ordinal measure, as it only recognizes whether one individual's value is higher (lower) than another's, not the size of the difference.






                  Comment


                  • #10
                    Possibly, the user community contributed Stata module robstat, to estimate robust univariate statistics, by Ben Jann, is usefull as well.
                    Note the presentations at the 2017 London Stata Users Group meeting, by Ben Jann and Vincenzo Verardi.
                    Besides robstat we also can use another user community contributed Stata module by Ben Jann: coefplot, for plotting regression coefficients and other results.
                    The following code produces the same example as of Carlo in #2.
                    Code:
                    * Setup
                    ssc install robstat , replace
                    ssc install coefplot, replace
                    set scheme sj
                    
                    * code for price​​​​​​
                    . robstat price, statistics(median) over(foreign) total cformat(%9,0fc)
                    
                    Robust Statistics                           Number of obs = 74
                    
                                0: foreign = Domestic
                                1: foreign = Foreign
                    
                    --------------------------------------------------------------
                          median | Coefficient  Std. err.     [95% conf. interval]
                    -------------+------------------------------------------------
                               0 |      4.783        141         4.501       5.064
                               1 |      5.759        163         5.435       6.083
                           total |      5.007        240         4.529       5.484
                    --------------------------------------------------------------
                    
                    
                    . est sto MED // store results in a matrix
                    . coefplot MED , coeflabels(0 = "local cars" 1 = "foreign cars" total = "all cars") xtitle("price of cars", m(t+1 b-2)) graphreg(m(l-3))
                    Which produces the following plot:
                    Click image for larger version

Name:	Example_robstat_cars_median_price.png
Views:	1
Size:	11.6 KB
ID:	1671374


                    Likewise for the repair record:
                    Code:
                    . robstat rep78 , statistics(median) over(foreign) total cformat(%9,2fc)
                    
                    Robust Statistics                           Number of obs = 69
                    
                                0: foreign = Domestic
                                1: foreign = Foreign
                    
                    --------------------------------------------------------------
                          median | Coefficient  Std. err.     [95% conf. interval]
                    -------------+------------------------------------------------
                               0 |       3,00       0,00          3,00        3,00
                               1 |       4,00       0,04          3,91        4,09
                           total |       3,00       0,04          2,93        3,07
                    --------------------------------------------------------------
                    
                    est sto REP
                    coefplot REP , coeflabels(0 = "local cars" 1 = "foreign cars" total = "all cars") xtitle("repair record of cars", m(t+1 b-2)) ytitle("median-values", m(r+1 l-1)) graphreg(m(l-1))
                    Which produces the following plot:
                    Click image for larger version

Name:	Example_robstat_cars_median_rep78.png
Views:	1
Size:	11.7 KB
ID:	1671375


                    http://publicationslist.org/eric.melse

                    Comment


                    • #11
                      Just to add to Rich Goldstein's comment, I wonder whether a permutation test might also be feasible. Especially if you want a p-value this can be interesting. See my dummy code below.


                      Code:
                      clear all
                      cap program drop med
                      program define med, rclass
                      syntax varlist(max=1), BY(varname)
                      sum `varlist' if `by' == 0, det
                      local med0 = r(p50)
                      sum `varlist' if `by' == 1, det
                      local med1 = r(p50)
                      return scalar meddiff = `med0' - `med1'
                      end
                      
                      
                      sysuse auto
                      permute foreign r(meddiff), reps(999) seed(123) nodots: med price, by(foreign)
                      Best wishes

                      (Stata 16.1 MP)

                      Comment


                      • #12
                        Pavan:
                        exploiting Felix's neat code without any clue of shame from my side, you can also go -boostrap- (BTW: I do recommend Felix's textbook on this topic: https://www.degruyter.com/document/d...0693348/html):
                        Code:
                        . sysuse auto
                        (1978 automobile data)
                        
                        . bootstrap foreign r(meddiff), reps(999) seed(123) nodots: med price, by(foreign)
                        
                        warning: med does not set e(sample), so no observations will be excluded from the resampling because of missing values or other reasons.
                                 To exclude observations, press Break, save the data, drop any observations that are to be excluded, and rerun bootstrap.
                        
                        Bootstrap results                                          Number of obs =  74
                                                                                   Replications  = 999
                        
                              Command: med price, by(foreign)
                                _bs_1: foreign
                                _bs_2: r(meddiff)
                        
                        ------------------------------------------------------------------------------
                                     |   Observed   Bootstrap                         Normal-based
                                     | coefficient  std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                               _bs_1 |          0   .4522512     0.00   1.000    -.8863961    .8863961
                               _bs_2 |     -976.5   584.3429    -1.67   0.095    -2121.791    168.7911
                        ------------------------------------------------------------------------------
                        
                        
                        . estat bootstrap, all
                        
                        Bootstrap results                               Number of obs     =         74
                                                                        Replications      =        999
                        
                              Command: med price, by(foreign)
                                _bs_1: foreign
                                _bs_2: r(meddiff)
                        
                        ------------------------------------------------------------------------------
                                     |    Observed               Bootstrap
                                     | coefficient       Bias    std. err.  [95% conf. interval]
                        -------------+----------------------------------------------------------------
                               _bs_1 |           0   .2862863   .45225124   -.8863961   .8863961   (N)
                                     |                                              0          1   (P)
                                     |                                              0          1  (BC)
                               _bs_2 |      -976.5    169.964   584.34292   -2121.791   168.7911   (N)
                                     |                                        -1849.5        407   (P)
                                     |                                          -2203        193  (BC)
                        ------------------------------------------------------------------------------
                        Key:  N: Normal
                              P: Percentile
                             BC: Bias-corrected
                        
                        . 
                        l
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Thanks John Mullahy, for drawing my attention to this. The method proposed in the paper you mention in #8 may work (not all regularity conditions are met, but it may be OK in most cases) but we cannot just use the t-test reported by qcount; we would have to use the procedure described at the end of 3.3. The t-test would be valid under the assumption that the two distributions are the same, not just the medians.
                          Last edited by Joao Santos Silva; 29 Jun 2022, 02:07.

                          Comment

                          Working...
                          X