signrank includes pairs with zero difference in the test statistic. Why?

Pär-Ola Bendahl

Join Date: Oct 2014

Posts: 7
#1

signrank includes pairs with zero difference in the test statistic. Why?

02 Feb 2016, 04:34

Hi,

Pairs with zero difference are usually excluded when the Wilcoxon matched-pairs signed-ranks test statistic is calculated, but the Stata implementation of this test (signrank) includes them and adjusts the variance of the statistic instead (STB reference from 1995). These approaches will in general not lead to identical P-values if zeros are present. Which method is preferable and why?

I noticed the problem when a colleague of mine tried to replicate a Stata 14.1-analysis performed by me in SPSS version 23.

The number of pairs in the analysis was 31 and only 11 of the differences were non-zero.

The P-value in SPSS was .119 compared to .278 in Stata.

The P-value will be .119 also in Stata if the pairs with zero difference are dropped before I run signrank.

Comments on that?

Pär-Ola Bendahl
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35432
#2

02 Feb 2016, 06:39

Consider the limiting case in which all differences are zero.

Would you drop them all and say that the data contain no information that is relevant?
Comment
Pär-Ola Bendahl

Join Date: Oct 2014

Posts: 7
#3

02 Feb 2016, 07:08

Thanks Nick,

Good argument to consider the limiting case.

I agree, the Stata version of the Mann-Whitney test for paired samples makes better sense than the standard test described in text books and implemented in for example SPSS and R.

The tests answer slightly different questions. The standard implementation is conditioning on a difference whereas the Stata implementation is not.

Slightly problematic though that there is no conensus regarding the definition of the MW-test for paired samples. With access to the original data and the statistics section of a paper, all results should be possible to reproduce, Hence, it is not sufficient to state that the Wilcoxon matched-pairs signed-ranks test was used to test for differences before and after treatment.

/Pär-Ola
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

02 Feb 2016, 07:24

You can take a look at the Methods and formulas section of user's manual entry for signrank for the explanation of why Stata handles ties as it does and why it's valid. And you can always check it out empirically if you're not satisfied. See below.

The test size is a wash: 0.049 versus 0.048 at a nominal 0.05.

The number of ties on average is somewhat lower than what you have in your particular dataset (one-third versus two-thirds tied), but the proportion of ties doesn't make any difference (see LOWESS plot).

.ÿversionÿ14.1

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsetÿseedÿ`=date("2016-02-02",ÿ"YMD")'

.ÿ
.ÿprogramÿdefineÿtestem,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ14.1
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntax
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿdropÿ_all
ÿÿ4.ÿÿÿÿÿÿÿÿÿquietlyÿsetÿobsÿ31
ÿÿ5.ÿÿÿÿÿÿÿÿÿranintÿleftÿright,ÿa(1)ÿb(3)
ÿÿ6.ÿ
.ÿÿÿÿÿÿÿÿÿsignrankÿleftÿ=ÿright
ÿÿ7.ÿÿÿÿÿÿÿÿÿtempnameÿz_tieÿties
ÿÿ8.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`z_tie'ÿ=ÿr(z)
ÿÿ9.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ties'ÿ=ÿr(N_tie)
ÿ10.ÿ
.ÿÿÿÿÿÿÿÿÿdropÿifÿleftÿ==ÿright
ÿ11.ÿÿÿÿÿÿÿÿÿsignrankÿleftÿ=ÿright
ÿ12.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿz_untieÿ=ÿr(z)
ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿz_tieÿ=ÿ`z_tie'
ÿ14.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿtiesÿ=ÿ`ties'
ÿ15.ÿend

.ÿ
.ÿsimulateÿz_untieÿ=ÿr(z_untie)ÿz_tieÿ=ÿr(z_tie)ÿtiesÿ=ÿr(ties),ÿreps(10000)ÿnodots:ÿtestem

ÿÿÿÿÿÿcommand:ÿÿtestem
ÿÿÿÿÿÿz_untie:ÿÿr(z_untie)
ÿÿÿÿÿÿÿÿz_tie:ÿÿr(z_tie)
ÿÿÿÿÿÿÿÿÿties:ÿÿr(ties)

.ÿ
.ÿforeachÿmethodÿinÿtieÿuntieÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿpos_`method'ÿ=ÿ2ÿ*ÿnormal(-abs(z_`method'))ÿ<ÿ0.05
ÿÿ3.ÿ}

.ÿformatÿpos_*ÿ%05.3f

.ÿsummarizeÿpos_*ÿties,ÿformat

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿDev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿpos_tieÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ0.049ÿÿÿÿÿÿÿ0.217ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿpos_untieÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ0.048ÿÿÿÿÿÿÿ0.213ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿÿÿÿtiesÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿ10.3372ÿÿÿÿ2.621409ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿ21

.ÿ
.ÿgraphÿtwowayÿscatterÿz_untieÿz_tie,ÿmsize(small)ÿmcolor(black)ÿ||ÿ///
>ÿÿÿÿÿÿÿÿÿlineÿz_tieÿz_tie,ÿsortÿlpattern(dash)ÿlcolor(white)ÿ///
>ÿÿÿÿÿÿÿÿÿylabel(ÿ,ÿangle(horizontal)ÿnogrid)ÿlegend(off)

.ÿquietlyÿgraphÿexportÿscatter.png

.ÿ
.ÿgenerateÿdoubleÿdeltaÿ=ÿz_tieÿ-ÿz_untie

.ÿgenerateÿdoubleÿavgÿ=ÿ(z_tieÿ+ÿz_untie)ÿ/ÿ2

.ÿsummarizeÿdelta,ÿmeanonly

.ÿgraphÿtwowayÿscatterÿdeltaÿavg,ÿmcolor(black)ÿmsize(small)ÿ///
>ÿÿÿÿÿÿÿÿÿyline(`=r(mean)',ÿlcolor(black)ÿlpattern(dash))ÿylabel(ÿ,ÿangle(horizontal)ÿnogrid)

.ÿquietlyÿgraphÿexportÿba.png

.ÿ
.ÿlowessÿdeltaÿties,ÿmcolor(black)ÿmsize(small)ÿlineopts(lcolor(black)ÿlpattern(dash))ÿ///
>ÿÿÿÿÿÿÿÿÿylabel(,ÿangle(horizontal)ÿnogrid)ÿytitle(tiedÿzÿ—ÿuntiedÿz)ÿxtitle(NumberÿofÿTies)

.ÿquietlyÿgraphÿexportÿzties.png

.ÿ
.ÿexit

endÿofÿdo-file

.

Attached Files

ranint.ado (358 Bytes, 5 views)
Comment
Pär-Ola Bendahl

Join Date: Oct 2014

Posts: 7
#5

02 Feb 2016, 08:37

Thanks Joseph,

Your simulations indicate, contrary to what I excepted, that the Z-statistics, with and without cases with zero-differences excluded, have the same distribution under the null hypothesis. Furthermore, the difference seems to be independet of the number of ties observed. So, as far as I understand, these simulations indicate that the way of handling tied observations might matter in a particular case, but one method does not give systematically lower P-values than the other. Correct interpretation?

Thanks for digging into this and for sharing your code from which I learned a lot.

/Pär-Ola
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

02 Feb 2016, 10:32

Once, this was also a concern of mine. I believe theory (with regards to use - or not - ties in the estimation) is the crucial issue here, not exactly the statistical package. Naturally, differences may rise up on account of default options. However, we still get quite similar results in the 3 packages (Stata, SPSS and R), provided the options are equivalent. We may change the default in SPSS, for example, and demand an "exact" estimation. Under R, even if we don't select the "exact" option, and according to R's help files, "by default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used". We may get similar results between R and Stata for default options. This notwithstanding, we may also "tell" R not to correct for ties, and the results will be similar to the ones found in SPSS under default estimation.

Hopefully that helps.

Best,

Marcos

Best regards,

Marcos
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#7

02 Feb 2016, 17:44

Yes, those would be my interpretations, as well.

As far a using exact distribution of the test statistic for sample sizes under 50, it seems that with as few as 31 observations the test size is well maintained even with the normal approximation (0.049 for 0.05).
Comment
Pär-Ola Bendahl

Join Date: Oct 2014

Posts: 7
#8

03 Feb 2016, 01:14

Thanks Nick, Joseph and Marcos for your helpful comments.
Comment
Pär-Ola Bendahl

Join Date: Oct 2014

Posts: 7
#9

03 Feb 2016, 02:08

No systematic difference between the Z-statistics with and without ties was seen under the null hypothesis.

By adding 1 to all the left values in Josephs code above (post #4) and re-running the code, considerable differences between the two Z-statistics were observed under the alternative hypothesis of a one unit score difference:

This leaves me with some worry.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#10

03 Feb 2016, 20:57

Stata's method of retaining tied values does seem to have very slightly lower power than SPSS's / R's method of dropping them. Power for both methods drops with increased proportion of ties, Stata's method perhaps very slightly more so.

The do-file and results are too long to display in the body of the post and so are attached.
Attached Files

power.smcl (4.5 KB, 1 view)

Bendahl3.do (2.0 KB, 1 view)

ranint.ado (358 Bytes, 5 views)
Comment

Announcement

signrank includes pairs with zero difference in the test statistic. Why?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment