Paired t-test & Wilcoxon signed rank test with samples with unequal sample sizes

Mathilde Nothomb

Join Date: Aug 2017

Posts: 3
#1

Paired t-test & Wilcoxon signed rank test with samples with unequal sample sizes

01 Aug 2017, 03:57

Dear all,

I have two samples with different sizes and would like to run a mean difference using a paired t-test and a median differences using a Wilcoxon signed rank test,

my problem is that I cannot find any tests proposed on STATA that allow me to run a paired t-test or Wilcoxon signed rank test with samples with unequal sizes.

Alternatively I see that I can use an independent sample t test and a Wilcoxon ranksum test for unequal sample sizes but those assume my samples would be independent.

My question is

- Would there be an option to perform mean/median difference tests for paired samples with unequal sample sizes ?

- Does different sample sizes work as an argument to consider my samples independent ?

Thank you for your help !
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35697
#2

01 Aug 2017, 04:59

I find it difficult to see how samples could be paired but unequal in size. If we have data on say married couples but data are missing for some spouses then the missings on either variable are ignored and the useable data are equal in length.

Perhaps you can be more specific about your situation.
1 like
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

01 Aug 2017, 07:37

Mathilde Nothomb You may check the "burden" of unequal pairs, ultimately due to missing data, as Nick remarked in #2.

Below, a toy example:

Code:

.  webuse fuel

. list

     +-------------+
     | mpg1   mpg2 |
     |-------------|
  1. |   20     24 |
  2. |   23     25 |
  3. |   21     21 |
  4. |   25     22 |
  5. |   18     23 |
     |-------------|
  6. |   17     18 |
  7. |   18     17 |
  8. |   24     28 |
  9. |   20     24 |
 10. |   24     27 |
     |-------------|
 11. |   23     21 |
 12. |   19     23 |
     +-------------+

.  signrank mpg1 = mpg2

Wilcoxon signed-rank test

        sign |      obs   sum ranks    expected
-------------+---------------------------------
    positive |        3        13.5        38.5
    negative |        8        63.5        38.5
        zero |        1           1           1
-------------+---------------------------------
         all |       12          78          78

unadjusted variance      162.50
adjustment for ties       -1.63
adjustment for zeros      -0.25
                     ----------
adjusted variance        160.63

Ho: mpg1 = mpg2
             z =  -1.973
    Prob > |z| =   0.0485

. replace mpg2 = . in 10
(1 real change made, 1 to missing)

.  signrank mpg1 = mpg2

Wilcoxon signed-rank test

        sign |      obs   sum ranks    expected
-------------+---------------------------------
    positive |        3          13        32.5
    negative |        7          52        32.5
        zero |        1           1           1
-------------+---------------------------------
         all |       11          66          66

unadjusted variance      126.50
adjustment for ties       -1.50
adjustment for zeros      -0.25
                     ----------
adjusted variance        124.75

Ho: mpg1 = mpg2
             z =  -1.746
    Prob > |z| =   0.0808

. replace mpg1 = . in 2
(1 real change made, 1 to missing)

.  signrank mpg1 = mpg2

Wilcoxon signed-rank test

        sign |      obs   sum ranks    expected
-------------+---------------------------------
    positive |        3        11.5          27
    negative |        6        42.5          27
        zero |        1           1           1
-------------+---------------------------------
         all |       10          55          55

unadjusted variance       96.25
adjustment for ties       -1.38
adjustment for zeros      -0.25
                     ----------
adjusted variance         94.63

Ho: mpg1 = mpg2
             z =  -1.593
    Prob > |z| =   0.1111

On account of this, you may get the answer for your questions, as underlined below:

My question is

- Would there be an option to perform mean/median difference tests for paired samples with unequal sample sizes ?

Actually, you don't need it, for Stata will estimate the signed-rank test for the "extant" pairs.

My question is

- Does different sample sizes work as an argument to consider my samples independent ?

No, they are just missing data. Provided you have "paired" variables, missing data will lead to an unsurprising phenomenon: decrease of power plus loss of the whole pair in the estimation.

To end, you may wish to check what happened with the p-values in the toy example.

Hopefully that helps.

Last edited by Marcos Almeida; 01 Aug 2017, 07:41.

Best regards,

Marcos

Comment

Mathilde Nothomb

Join Date: Aug 2017

Posts: 3
#4

03 Aug 2017, 10:28

Dear Nick and Marcos, thank you very much for your responses !

To give a clearer example of my samples and why I believe they are paired while still having different sample sizes:

I have a group of companies (ex.: n=50) and I used a threshold to cut the group in two different groups:

lets say I decide that all companies with a market capitalisation > or = to ...specific number... are in group 1 and all companies with a market capitalisation of < ...same specific number... are in

group 2, this gives me a sample with 30 companies and a sample with 20 companies for example.

I believe those samples are not independent because they come from the same primary sample but they also don't have the same size.

I might be mistaken tough and this does not fall under the paired samples assumptions.

Thank you Marcos for your example, I will try it out with my data, my only worry is that I am not sure that its empirically correct for me to do this since I don't really have any missing data in the

smaller sample, I just don't have the matching number of observations.

Thank you in advance for your comments !

Mathilde
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#5

03 Aug 2017, 10:35

As I understand it, your samples are not paired at all. It wouldn't make sense to compare the groups on the variable on which they were split, just as asking whether taller people are taller than shorter people is not a helpful idea.

That said, reducing a single continuous variable to two categories is a recipe for throwing away information.
Comment
Mathilde Nothomb

Join Date: Aug 2017

Posts: 3
#6

04 Aug 2017, 00:55

I basically split companies using size and then I compare whether I have a significant mean difference for another variable for ex: more or less for example directors on the board in my sample of

small companies than in my sample of large companies.

But I'll just look further into the independence assumption or whether I can test my full sample in another way.

Best,

Mathilde
Comment
daniel klein

Join Date: Mar 2014

Posts: 3849
#7

04 Aug 2017, 01:11

Nick is right. Paired data has not much to do with an intuitive understanding of "dependence". Paired data occurs when the same company is observed repeatedly. So you definitely do not have paired data.

Best
Daniel
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#8

04 Aug 2017, 08:27

Why not just look for the relationships with firm size? i.e. treat it as a continuous predictor?
Comment

Announcement