Statistical significance between two MEDIAN values

pavan pandey

Join Date: Apr 2019

Posts: 75
#1

Statistical significance between two MEDIAN values

28 Jun 2022, 03:53

Hi Stata Community,

Very often I work with variables that have a fixed value e.g. 1 2 3 4 5 (and not 1.1, or 2.2, or 4.6 etc.,) such as Glasgow Coman Scale, Visual Analogue of Pain score, SOFA score.

The best measure of central tendency for such variables is MEDIAN. However, I am unaware of any statistical test that can be applied in Stata that tells me if the difference between the median between two groups of people from the same population is statistically significant.

For example

If the median VAS score with and without a painkiller is 5 and 8, how do I assess if the difference in the median value is statistically significant?

Best Regards
Pavan
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

28 Jun 2022, 03:57

Pavan:
quantile regression at the median might be an option:

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta"
(1978 automobile data)

. qreg price i.foreign
Iteration  1:  WLS sum of weighted deviations =  74892.779

Iteration  1: sum of abs. weighted deviations =    75241.5
note: alternate solutions exist.
Iteration  2: sum of abs. weighted deviations =    70307.5
note: alternate solutions exist.
Iteration  3: sum of abs. weighted deviations =    69547.5

Median regression                                   Number of obs =         74
  Raw sum of deviations  71102.5 (about 4934)
  Min sum of deviations  69547.5                    Pseudo R2     =     0.0219

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |        983   597.6291     1.64   0.104    -208.3519    2174.352
       _cons |       4816   325.8571    14.78   0.000     4166.416    5465.584
------------------------------------------------------------------------------

.

Code:

. qreg rep78 i.foreign
Iteration  1:  WLS sum of weighted deviations =  18.845463

Iteration  1: sum of abs. weighted deviations =       18.5
note: alternate solutions exist.
Iteration  2: sum of abs. weighted deviations =       18.5
Iteration  3: sum of abs. weighted deviations =       18.5

Median regression                                   Number of obs =         69
  Raw sum of deviations       26 (about 3)
  Min sum of deviations     18.5                    Pseudo R2     =     0.2885

------------------------------------------------------------------------------
       rep78 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |          1   .0840398    11.90   0.000     .8322559    1.167744
       _cons |          3   .0463628    64.71   0.000     2.907459    3.092541
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35433
#3

28 Jun 2022, 04:09

The best measure of central tendency for such variables is MEDIAN

Saying that commits you to saying that the best summary of 1 1 2 2 2 and of 2 2 2 3 5 is the same, namely 2.

Hands up if you think that's discarding information that should not be discarded. Caution is always advisable but I would not object to midmeans of 1.67 or 2.33 or even means of 1.6 or 2.8 as two other summaries here.

A classic test case is grade-point averages, which are based on taking means of ordered grades, despite the many books and papers that tell you not to do that.

Last edited by Nick Cox; 28 Jun 2022, 04:44.
2 likes
Comment
ericmelse

Join Date: May 2014

Posts: 425
#4

28 Jun 2022, 04:13

Dear Pavan,

Possibly, this paper offers you good advice to compare and examine median values between group categories:
Conroy, R. M. (2012). What Hypotheses do “Nonparametric” Two-Group Tests Actually Test? The Stata Journal, 12(2), 182–190.

Best,
Eric

http://publicationslist.org/eric.melse
3 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

28 Jun 2022, 07:24

For the mechanics of how to test the difference in medians across two groups, go with Carlo's helpful solution. I do wonder about the statistical properties of the resulting asymptotic t statistic. Typically the underlying random variable is continuous, or at least continuous in a neighborhood of the median. I'm not sure what is the latest with discrete outcomes.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#6

28 Jun 2022, 08:08

medians are difficult and the "definitive" solution is the bootstrap; I am out of town and can't give full references but Efron and Tibshirani's book on the bootstrap has quite a bit on testing medians; other than that, the solution above from Carlo is good and I agree with Jeff's point also
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#7

28 Jun 2022, 08:08

Thanks, Jeff!
You really made my day!

Kind regards,
Carlo
(Stata 19.0)
Comment
John Mullahy

Join Date: Dec 2016

Posts: 742
#8

28 Jun 2022, 08:41

Joao Santos Silva : Do you have any suggestions? (see https://www.jstor.org/stable/27590667)
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#9

28 Jun 2022, 09:17

My choice here would be Somers' D (-ssc describe somersd-). It's not a measure of median differences, but I'd presume it to be closely related to that. In the current context it would measure, more or less, the probability that a randomly chosen member of group 1 has a higher (lower) score than a member of group 2. As Roger Newson has explained in his articles cited in the documentation for -somersd-, this statistic has some interesting relations to various "nonparametric" statistics. For better or worse, it's a genuinely ordinal measure, as it only recognizes whether one individual's value is higher (lower) than another's, not the size of the difference.
1 like
Comment

ericmelse

Join Date: May 2014
Posts: 425

#10

29 Jun 2022, 00:20

Possibly, the user community contributed Stata module robstat, to estimate robust univariate statistics, by Ben Jann, is usefull as well.
Note the presentations at the 2017 London Stata Users Group meeting, by Ben Jann and Vincenzo Verardi.
Besides robstat we also can use another user community contributed Stata module by Ben Jann: coefplot, for plotting regression coefficients and other results.
The following code produces the same example as of Carlo in #2.

Code:

* Setup
ssc install robstat , replace
ssc install coefplot, replace
set scheme sj

* code for price
. robstat price, statistics(median) over(foreign) total cformat(%9,0fc)

Robust Statistics                           Number of obs = 74

            0: foreign = Domestic
            1: foreign = Foreign

--------------------------------------------------------------
      median | Coefficient  Std. err.     [95% conf. interval]
-------------+------------------------------------------------
           0 |      4.783        141         4.501       5.064
           1 |      5.759        163         5.435       6.083
       total |      5.007        240         4.529       5.484
--------------------------------------------------------------


. est sto MED // store results in a matrix
. coefplot MED , coeflabels(0 = "local cars" 1 = "foreign cars" total = "all cars") xtitle("price of cars", m(t+1 b-2)) graphreg(m(l-3))

Which produces the following plot:

Click image for larger version

Name: Example_robstat_cars_median_price.png
Views: 1
Size: 11.6 KB
ID: 1671374

Likewise for the repair record:

Code:

. robstat rep78 , statistics(median) over(foreign) total cformat(%9,2fc)

Robust Statistics                           Number of obs = 69

            0: foreign = Domestic
            1: foreign = Foreign

--------------------------------------------------------------
      median | Coefficient  Std. err.     [95% conf. interval]
-------------+------------------------------------------------
           0 |       3,00       0,00          3,00        3,00
           1 |       4,00       0,04          3,91        4,09
       total |       3,00       0,04          2,93        3,07
--------------------------------------------------------------

est sto REP
coefplot REP , coeflabels(0 = "local cars" 1 = "foreign cars" total = "all cars") xtitle("repair record of cars", m(t+1 b-2)) ytitle("median-values", m(r+1 l-1)) graphreg(m(l-1))

Which produces the following plot:

Click image for larger version

Name: Example_robstat_cars_median_rep78.png
Views: 1
Size: 11.7 KB
ID: 1671375

http://publicationslist.org/eric.melse

Comment

Felix Bittmann

Join Date: Aug 2018
Posts: 662

#11

29 Jun 2022, 00:47

Just to add to Rich Goldstein's comment, I wonder whether a permutation test might also be feasible. Especially if you want a p-value this can be interesting. See my dummy code below.

Code:

clear all
cap program drop med
program define med, rclass
syntax varlist(max=1), BY(varname)
sum `varlist' if `by' == 0, det
local med0 = r(p50)
sum `varlist' if `by' == 1, det
local med1 = r(p50)
return scalar meddiff = `med0' - `med1'
end


sysuse auto
permute foreign r(meddiff), reps(999) seed(123) nodots: med price, by(foreign)

Best wishes

(Stata 16.1 MP)

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

#12

29 Jun 2022, 01:54

Pavan:
exploiting Felix's neat code without any clue of shame from my side, you can also go -boostrap- (BTW: I do recommend Felix's textbook on this topic: https://www.degruyter.com/document/d...0693348/html):

Code:

. sysuse auto
(1978 automobile data)

. bootstrap foreign r(meddiff), reps(999) seed(123) nodots: med price, by(foreign)

warning: med does not set e(sample), so no observations will be excluded from the resampling because of missing values or other reasons.
         To exclude observations, press Break, save the data, drop any observations that are to be excluded, and rerun bootstrap.

Bootstrap results                                          Number of obs =  74
                                                           Replications  = 999

      Command: med price, by(foreign)
        _bs_1: foreign
        _bs_2: r(meddiff)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |          0   .4522512     0.00   1.000    -.8863961    .8863961
       _bs_2 |     -976.5   584.3429    -1.67   0.095    -2121.791    168.7911
------------------------------------------------------------------------------


. estat bootstrap, all

Bootstrap results                               Number of obs     =         74
                                                Replications      =        999

      Command: med price, by(foreign)
        _bs_1: foreign
        _bs_2: r(meddiff)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             | coefficient       Bias    std. err.  [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |           0   .2862863   .45225124   -.8863961   .8863961   (N)
             |                                              0          1   (P)
             |                                              0          1  (BC)
       _bs_2 |      -976.5    169.964   584.34292   -2121.791   168.7911   (N)
             |                                        -1849.5        407   (P)
             |                                          -2203        193  (BC)
------------------------------------------------------------------------------
Key:  N: Normal
      P: Percentile
     BC: Bias-corrected

. 
l

Kind regards,
Carlo
(Stata 19.0)

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#13

29 Jun 2022, 02:03

Thanks John Mullahy, for drawing my attention to this. The method proposed in the paper you mention in #8 may work (not all regularity conditions are met, but it may be OK in most cases) but we cannot just use the t-test reported by qcount; we would have to use the procedure described at the end of 3.3. The t-test would be valid under the assumption that the two distributions are the same, not just the medians.

Last edited by Joao Santos Silva; 29 Jun 2022, 02:07.
1 like
Comment

Announcement