Testing the difference between medians

Joe Tuckles

Join Date: Jul 2018

Posts: 180
#1

Testing the difference between medians

20 Mar 2020, 05:37

Hi,

I have two groups. One group watches a median of 4 hours of TV per day. The other group watches a median of 3 hours of TV per day. How do I formally test for a difference?

Thanks
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#2

20 Mar 2020, 05:40

Dear Joe Tuckles

You can run a quantile regression of time watching TV on a constant and on a dummy indicating the group and test the significance of the coefficient associated with the dummy.

Best wishes,

Joao
1 like
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#3

20 Mar 2020, 06:26

Hi I'm not too clear on this would that just:

Code:

qreg group tv

?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

20 Mar 2020, 06:44

Joe:
it should be something along the following toy-example:

Code:

use https://www.stata-press.com/data/r16/auto2
. qreg mpg i.foreign
Iteration  1:  WLS sum of weighted deviations =  147.52217

Iteration  1: sum of abs. weighted deviations =        149
note:  alternate solutions exist
Iteration  2: sum of abs. weighted deviations =        145

Median regression                                   Number of obs =         74
  Raw sum of deviations      164 (about 20)
  Min sum of deviations      145                    Pseudo R2     =     0.1159

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |          6   1.537162     3.90   0.000     2.935724    9.064276
       _cons |         19    .838137    22.67   0.000      17.3292     20.6708
------------------------------------------------------------------------------

.

In your case:

Code:

qreg tv i.group

Kind regards,
Carlo
(Stata 19.0)

Comment

Joe Tuckles

Join Date: Jul 2018
Posts: 180

20 Mar 2020, 06:48

Thanks I thought the results were odd. I have rerun with that code and now got the following - do my findings mean there is a significant difference between groups in the median number of hours?

Code:

. qreg tv i.group
Iteration  1:  WLS sum of weighted deviations =  1021.5727

Iteration  1: sum of abs. weighted deviations =       1028
note:  alternate solutions exist
Iteration  2: sum of abs. weighted deviations =       1005
note:  alternate solutions exist
Iteration  3: sum of abs. weighted deviations =       1005

Median regression                                   Number of obs =        968
  Raw sum of deviations     1011 (about 3)
  Min sum of deviations     1005                    Pseudo R2     =     0.0059

------------------------------------------------------------------------------
          tv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.group |          1    .219167     4.56   0.000     .5699016    1.430098
       _cons |          3   .0896592    33.46   0.000     2.824051    3.175949
------------------------------------------------------------------------------

.

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#6

20 Mar 2020, 07:00

I think it is the other way around...
Comment

Joe Tuckles

Join Date: Jul 2018
Posts: 180

20 Mar 2020, 07:06

The other way round produces this:

Code:

. qreg group tv
Iteration  1:  WLS sum of weighted deviations =   93.52788

Iteration  1: sum of abs. weighted deviations =         81
Iteration  2: sum of abs. weighted deviations =         81
note:  alternate solutions exist
Iteration  3: sum of abs. weighted deviations =         81
note:  alternate solutions exist
Iteration  4: sum of abs. weighted deviations =         81

Median regression                                   Number of obs =        968
  Raw sum of deviations       81 (about 0)
  Min sum of deviations       81                    Pseudo R2     =     0.0000

------------------------------------------------------------------------------
       group |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          tv |          0  (omitted)
       _cons |          0  (omitted)
------------------------------------------------------------------------------

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#8

20 Mar 2020, 15:29

I was replying to #3
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#9

21 Mar 2020, 02:31

The problem is that those standard errors and p-values are justified only when your y variable is roughly continuous. It looks like your variable tv is discrete.
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#10

22 Mar 2020, 05:38

OK, is there a way to test the difference between medians for two groups when the TV is discrete?
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#11

22 Mar 2020, 05:40

I thought as the question is about time (how many hours do you spend watching TV) it would be continuous.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#12

22 Mar 2020, 05:56

Clearly time is continuous, at least for this purpose, but what counts is the resolution of your measurements, which affects the variability of the medians. For example, if reported times are integers, then the possible medians are either integers or half-integers and the sampling distribution is likely to be (extremely) spiky. Bootstrapping or permutation tests may help, if only by underlining the problems.
1 like
Comment

Joe Tuckles

Join Date: Jul 2018
Posts: 180

#13

22 Mar 2020, 06:24

Okay thank you. Does this highlight any problems?

Code:

 . bootstrap, reps(100) seed(1): qreg tv i.group
(running qreg on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100

Median regression                                   Number of obs =        968
  Raw sum of deviations     1011 (about 3)
  Min sum of deviations     1005                    Pseudo R2     =     0.0059

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
          tv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.group |          1   .4578165     2.18   0.029     .1026961    1.897304
       _cons |          3   .1969464    15.23   0.000     2.613992    3.386008
------------------------------------------------------------------------------

.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35436
#14

22 Mar 2020, 06:56

Note that your figures of merit have changed, and to my prejudiced eye that underlines that even P < 0.0005 is fallible. On the evidence in front of us the groups really are different, but I'd rather see two histograms!
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#15

22 Mar 2020, 07:30

Please find attached two histograms. Please note - group 1 n=825, group 2 n=165. This is a nested-case control dataset.

Group 1:

Group 2:
Comment

Announcement