Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing the difference between medians

    Hi,

    I have two groups. One group watches a median of 4 hours of TV per day. The other group watches a median of 3 hours of TV per day. How do I formally test for a difference?

    Thanks

  • #2
    Dear Joe Tuckles

    You can run a quantile regression of time watching TV on a constant and on a dummy indicating the group and test the significance of the coefficient associated with the dummy.

    Best wishes,

    Joao

    Comment


    • #3
      Hi I'm not too clear on this would that just:

      Code:
       qreg group tv
      ?

      Comment


      • #4
        Joe:
        it should be something along the following toy-example:
        Code:
        use https://www.stata-press.com/data/r16/auto2
        . qreg mpg i.foreign
        Iteration  1:  WLS sum of weighted deviations =  147.52217
        
        Iteration  1: sum of abs. weighted deviations =        149
        note:  alternate solutions exist
        Iteration  2: sum of abs. weighted deviations =        145
        
        Median regression                                   Number of obs =         74
          Raw sum of deviations      164 (about 20)
          Min sum of deviations      145                    Pseudo R2     =     0.1159
        
        ------------------------------------------------------------------------------
                 mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             foreign |
            Foreign  |          6   1.537162     3.90   0.000     2.935724    9.064276
               _cons |         19    .838137    22.67   0.000      17.3292     20.6708
        ------------------------------------------------------------------------------
        
        .
        In your case:
        Code:
        qreg tv i.group
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks I thought the results were odd. I have rerun with that code and now got the following - do my findings mean there is a significant difference between groups in the median number of hours?

          Code:
          . qreg tv i.group
          Iteration  1:  WLS sum of weighted deviations =  1021.5727
          
          Iteration  1: sum of abs. weighted deviations =       1028
          note:  alternate solutions exist
          Iteration  2: sum of abs. weighted deviations =       1005
          note:  alternate solutions exist
          Iteration  3: sum of abs. weighted deviations =       1005
          
          Median regression                                   Number of obs =        968
            Raw sum of deviations     1011 (about 3)
            Min sum of deviations     1005                    Pseudo R2     =     0.0059
          
          ------------------------------------------------------------------------------
                    tv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               1.group |          1    .219167     4.56   0.000     .5699016    1.430098
                 _cons |          3   .0896592    33.46   0.000     2.824051    3.175949
          ------------------------------------------------------------------------------
          
          .

          Comment


          • #6
            I think it is the other way around...

            Comment


            • #7
              The other way round produces this:

              Code:
              . qreg group tv
              Iteration  1:  WLS sum of weighted deviations =   93.52788
              
              Iteration  1: sum of abs. weighted deviations =         81
              Iteration  2: sum of abs. weighted deviations =         81
              note:  alternate solutions exist
              Iteration  3: sum of abs. weighted deviations =         81
              note:  alternate solutions exist
              Iteration  4: sum of abs. weighted deviations =         81
              
              Median regression                                   Number of obs =        968
                Raw sum of deviations       81 (about 0)
                Min sum of deviations       81                    Pseudo R2     =     0.0000
              
              ------------------------------------------------------------------------------
                     group |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                        tv |          0  (omitted)
                     _cons |          0  (omitted)
              ------------------------------------------------------------------------------

              Comment


              • #8
                I was replying to #3

                Comment


                • #9
                  The problem is that those standard errors and p-values are justified only when your y variable is roughly continuous. It looks like your variable tv is discrete.

                  Comment


                  • #10
                    OK, is there a way to test the difference between medians for two groups when the TV is discrete?

                    Comment


                    • #11
                      I thought as the question is about time (how many hours do you spend watching TV) it would be continuous.

                      Comment


                      • #12
                        Clearly time is continuous, at least for this purpose, but what counts is the resolution of your measurements, which affects the variability of the medians. For example, if reported times are integers, then the possible medians are either integers or half-integers and the sampling distribution is likely to be (extremely) spiky. Bootstrapping or permutation tests may help, if only by underlining the problems.

                        Comment


                        • #13
                          Okay thank you. Does this highlight any problems?

                          Code:
                           . bootstrap, reps(100) seed(1): qreg tv i.group
                          (running qreg on estimation sample)
                          
                          Bootstrap replications (100)
                          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
                          ..................................................    50
                          ..................................................   100
                          
                          Median regression                                   Number of obs =        968
                            Raw sum of deviations     1011 (about 3)
                            Min sum of deviations     1005                    Pseudo R2     =     0.0059
                          
                          ------------------------------------------------------------------------------
                                       |   Observed   Bootstrap                         Normal-based
                                    tv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                               1.group |          1   .4578165     2.18   0.029     .1026961    1.897304
                                 _cons |          3   .1969464    15.23   0.000     2.613992    3.386008
                          ------------------------------------------------------------------------------
                          
                          .

                          Comment


                          • #14
                            Note that your figures of merit have changed, and to my prejudiced eye that underlines that even P < 0.0005 is fallible. On the evidence in front of us the groups really are different, but I'd rather see two histograms!

                            Comment


                            • #15
                              Please find attached two histograms. Please note - group 1 n=825, group 2 n=165. This is a nested-case control dataset.

                              Group 1:

                              Click image for larger version

Name:	Graph0.png
Views:	1
Size:	32.4 KB
ID:	1542301



                              Group 2:

                              Click image for larger version

Name:	Graph1.png
Views:	1
Size:	27.6 KB
ID:	1542302

                              Comment

                              Working...
                              X