test for paired sample with unequal variances

Yue YY

Join Date: May 2018

Posts: 41
#1

test for paired sample with unequal variances

16 Jan 2019, 09:08

Dear Statalists,

I have two sample (348 cases and 371 controls) matched by sex and age. Now I would like to do something with their finger length ratio (a numerical var). When I run -sdtest-, I found the sd is unequal between the two sample. For this case, should I use -ttest- to check the mean difference? (I found the option [, une] for -ttest- command, but my samples are paired so I'm not sure if I should choose the option or find another way)
Thank you!

Yue

Last edited by Yue YY; 16 Jan 2019, 09:11.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

16 Jan 2019, 09:45

Since these are matched pair data you must use the paired t-test; the unpaired ttest is simply not valid for matched pairs. And for the paired ttest, the variances in the two samples are irrelevant, so no need to see if they are equal.

That said, it isn't possible to have matched pair data with unequal sample sizes. So what is actually going on here?
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

16 Jan 2019, 09:51

Assuming you can explain and remove the 23 singletons, then a scatter plot would be as helpful here with case and control as variables. That should make clear whether the differing SDs are a side-effect of outliers or even whether they are to be considered a big deal. Often whether the means differ isn't the most important fact for emphasis.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4374

16 Jan 2019, 16:15

Try something like the following. Begin at the "Begin here" comment.

Code:

version 15.1

clear *

set seed `=strreverse("1479093")'

quietly set obs 5
generate byte agesex_grp = _n
generate double agesex_u = rnormal(0, 2)
tempfile us
quietly save `us'

drop _all
quietly set obs 371

generate byte agesex_grp = runiformint(1, 5)
generate byte snp_grp = 0
generate double twodeefourdee = rnormal(0.94, 0.35)

tempfile controls
quietly save `controls'

drop _all
quietly set obs 348
generate byte agesex_grp = runiformint(1, 5)
generate byte snp_grp = 1
generate double twodeefourdee = rnormal(0.97, 0.65)

append using `controls'

merge m:1 agesex_grp using `us', assert(match) nogenerate noreport
quietly replace twodeefourdee = twodeefourdee + agesex_u

*
* Begin here
*
mixed twodeefourdee i.snp_grp || agesex_grp: , ///
    residuals(independent, by(snp_grp)) ///
    nolrtest nolog

// Compare:
xtreg twodeefourdee i.snp_grp, i(agesex_grp) fe

exit

I'm guessing that, as Clyde and Nick have already pointed out, you're placing too much emphasis on within-group variances.

Comment

Yue YY

Join Date: May 2018

Posts: 41
#5

17 Jan 2019, 09:52

Originally posted by Clyde Schechter View Post

Since these are matched pair data you must use the paired t-test; the unpaired ttest is simply not valid for matched pairs. And for the paired ttest, the variances in the two samples are irrelevant, so no need to see if they are equal.

That said, it isn't possible to have matched pair data with unequal sample sizes. So what is actually going on here?

Thank you Clyde. The sample sizes are indeed unequal even they are matched. I assume it could be because two samples are matched by age and sex, so some cases are matched with more than one control ... I read the guide again, because I only want to compare one variable between two samples, I can't use paired t-test command like -ttest var1==var2- here. I tried to separate the target variable into two (creating two new variables), but it didn't work.

Last edited by Yue YY; 17 Jan 2019, 10:00.
Comment
Yue YY

Join Date: May 2018

Posts: 41
#6

17 Jan 2019, 09:57

Originally posted by Nick Cox View Post

Assuming you can explain and remove the 23 singletons, then a scatter plot would be as helpful here with case and control as variables. That should make clear whether the differing SDs are a side-effect of outliers or even whether they are to be considered a big deal. Often whether the means differ isn't the most important fact for emphasis.

That's a great idea. Thank you very much, Nick.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#7

17 Jan 2019, 12:02

Re #5. I believe that if I saw your data, I could show you how to create the two separate variables for -ttest var1 = var2- using either -reshape- or -separate- commands.

Be that as it may, if your data contains a single variable, let's call it var, and an indicator for which group (1 or 2) each observation belongs to (call this variable group), and another variable containing the ID of the matched case for each control (and the ID of the case itself for each case), call that one tuple, then you can emulate the paired ttest by running:

Code:

regress var i.group i.tuple

The test for the variable group will be identical to what you would get from changing your data layout to do the paired t-test.

My point is, that it is not legitimate to use an unpaired t-test for this kind of data: the results are simply invalid, and they aren't even predictably wrong in one direction or the other. They're just wrong and useless. You must use an analysis that respects the matching. One can question whether any kind of ttest is appropriate here: if you believe it is, then you need the paired version and the code above enables you to do it. Exploring other options is good, too, but whatever you do, the analysis absolutely must respect the matching.
1 like
Comment

Yue YY

Join Date: May 2018
Posts: 41

17 Jan 2019, 13:45

Originally posted by Joseph Coveney View Post

Try something like the following. Begin at the "Begin here" comment.

Code:

version 15.1

clear *

set seed `=strreverse("1479093")'

quietly set obs 5
generate byte agesex_grp = _n
generate double agesex_u = rnormal(0, 2)
tempfile us
quietly save `us'

drop _all
quietly set obs 371

generate byte agesex_grp = runiformint(1, 5)
generate byte snp_grp = 0
generate double twodeefourdee = rnormal(0.94, 0.35)

tempfile controls
quietly save `controls'

drop _all
quietly set obs 348
generate byte agesex_grp = runiformint(1, 5)
generate byte snp_grp = 1
generate double twodeefourdee = rnormal(0.97, 0.65)

append using `controls'

merge m:1 agesex_grp using `us', assert(match) nogenerate noreport
quietly replace twodeefourdee = twodeefourdee + agesex_u

*
* Begin here
*
mixed twodeefourdee i.snp_grp || agesex_grp: , ///
residuals(independent, by(snp_grp)) ///
nolrtest nolog

// Compare:
xtreg twodeefourdee i.snp_grp, i(agesex_grp) fe

exit

I'm guessing that, as Clyde and Nick have already pointed out, you're placing too much emphasis on within-group variances.

Thank you very much Joseph!

Announcement