Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired t-test & Wilcoxon signed rank test with samples with unequal sample sizes

    Dear all,

    I have two samples with different sizes and would like to run a mean difference using a paired t-test and a median differences using a Wilcoxon signed rank test,

    my problem is that I cannot find any tests proposed on STATA that allow me to run a paired t-test or Wilcoxon signed rank test with samples with unequal sizes.

    Alternatively I see that I can use an independent sample t test and a Wilcoxon ranksum test for unequal sample sizes but those assume my samples would be independent.

    My question is

    - Would there be an option to perform mean/median difference tests for paired samples with unequal sample sizes ?

    - Does different sample sizes work as an argument to consider my samples independent ?

    Thank you for your help !

  • #2
    I find it difficult to see how samples could be paired but unequal in size. If we have data on say married couples but data are missing for some spouses then the missings on either variable are ignored and the useable data are equal in length.

    Perhaps you can be more specific about your situation.

    Comment


    • #3
      Mathilde Nothomb You may check the "burden" of unequal pairs, ultimately due to missing data, as Nick remarked in #2.

      Below, a toy example:

      Code:
      .  webuse fuel
      
      . list
      
           +-------------+
           | mpg1   mpg2 |
           |-------------|
        1. |   20     24 |
        2. |   23     25 |
        3. |   21     21 |
        4. |   25     22 |
        5. |   18     23 |
           |-------------|
        6. |   17     18 |
        7. |   18     17 |
        8. |   24     28 |
        9. |   20     24 |
       10. |   24     27 |
           |-------------|
       11. |   23     21 |
       12. |   19     23 |
           +-------------+
      
      .  signrank mpg1 = mpg2
      
      Wilcoxon signed-rank test
      
              sign |      obs   sum ranks    expected
      -------------+---------------------------------
          positive |        3        13.5        38.5
          negative |        8        63.5        38.5
              zero |        1           1           1
      -------------+---------------------------------
               all |       12          78          78
      
      unadjusted variance      162.50
      adjustment for ties       -1.63
      adjustment for zeros      -0.25
                           ----------
      adjusted variance        160.63
      
      Ho: mpg1 = mpg2
                   z =  -1.973
          Prob > |z| =   0.0485
      
      . replace mpg2 = . in 10
      (1 real change made, 1 to missing)
      
      .  signrank mpg1 = mpg2
      
      Wilcoxon signed-rank test
      
              sign |      obs   sum ranks    expected
      -------------+---------------------------------
          positive |        3          13        32.5
          negative |        7          52        32.5
              zero |        1           1           1
      -------------+---------------------------------
               all |       11          66          66
      
      unadjusted variance      126.50
      adjustment for ties       -1.50
      adjustment for zeros      -0.25
                           ----------
      adjusted variance        124.75
      
      Ho: mpg1 = mpg2
                   z =  -1.746
          Prob > |z| =   0.0808
      
      . replace mpg1 = . in 2
      (1 real change made, 1 to missing)
      
      .  signrank mpg1 = mpg2
      
      Wilcoxon signed-rank test
      
              sign |      obs   sum ranks    expected
      -------------+---------------------------------
          positive |        3        11.5          27
          negative |        6        42.5          27
              zero |        1           1           1
      -------------+---------------------------------
               all |       10          55          55
      
      unadjusted variance       96.25
      adjustment for ties       -1.38
      adjustment for zeros      -0.25
                           ----------
      adjusted variance         94.63
      
      Ho: mpg1 = mpg2
                   z =  -1.593
          Prob > |z| =   0.1111
      On account of this, you may get the answer for your questions, as underlined below:

      My question is

      - Would there be an option to perform mean/median difference tests for paired samples with unequal sample sizes ?
      Actually, you don't need it, for Stata will estimate the signed-rank test for the "extant" pairs.

      My question is

      - Does different sample sizes work as an argument to consider my samples independent ?
      No, they are just missing data. Provided you have "paired" variables, missing data will lead to an unsurprising phenomenon: decrease of power plus loss of the whole pair in the estimation.

      To end, you may wish to check what happened with the p-values in the toy example.

      Hopefully that helps.
      Last edited by Marcos Almeida; 01 Aug 2017, 07:41.
      Best regards,

      Marcos

      Comment


      • #4
        Dear Nick and Marcos, thank you very much for your responses !


        To give a clearer example of my samples and why I believe they are paired while still having different sample sizes:

        I have a group of companies (ex.: n=50) and I used a threshold to cut the group in two different groups:

        lets say I decide that all companies with a market capitalisation > or = to ...specific number... are in group 1 and all companies with a market capitalisation of < ...same specific number... are in

        group 2, this gives me a sample with 30 companies and a sample with 20 companies for example.


        I believe those samples are not independent because they come from the same primary sample but they also don't have the same size.

        I might be mistaken tough and this does not fall under the paired samples assumptions.



        Thank you Marcos for your example, I will try it out with my data, my only worry is that I am not sure that its empirically correct for me to do this since I don't really have any missing data in the

        smaller sample, I just don't have the matching number of observations.


        Thank you in advance for your comments !

        Mathilde

        Comment


        • #5
          As I understand it, your samples are not paired at all. It wouldn't make sense to compare the groups on the variable on which they were split, just as asking whether taller people are taller than shorter people is not a helpful idea.

          That said, reducing a single continuous variable to two categories is a recipe for throwing away information.

          Comment


          • #6
            I basically split companies using size and then I compare whether I have a significant mean difference for another variable for ex: more or less for example directors on the board in my sample of

            small companies than in my sample of large companies.

            But I'll just look further into the independence assumption or whether I can test my full sample in another way.

            Best,

            Mathilde

            Comment


            • #7
              Nick is right. Paired data has not much to do with an intuitive understanding of "dependence". Paired data occurs when the same company is observed repeatedly. So you definitely do not have paired data.

              Best
              Daniel

              Comment


              • #8
                Why not just look for the relationships with firm size? i.e. treat it as a continuous predictor?

                Comment

                Working...
                X