Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • test for paired sample with unequal variances

    Dear Statalists,

    I have two sample (348 cases and 371 controls) matched by sex and age. Now I would like to do something with their finger length ratio (a numerical var). When I run -sdtest-, I found the sd is unequal between the two sample. For this case, should I use -ttest- to check the mean difference? (I found the option [, une] for -ttest- command, but my samples are paired so I'm not sure if I should choose the option or find another way)
    Thank you!

    Yue
    Last edited by Yue YY; 16 Jan 2019, 09:11.

  • #2
    Since these are matched pair data you must use the paired t-test; the unpaired ttest is simply not valid for matched pairs. And for the paired ttest, the variances in the two samples are irrelevant, so no need to see if they are equal.

    That said, it isn't possible to have matched pair data with unequal sample sizes. So what is actually going on here?

    Comment


    • #3
      Assuming you can explain and remove the 23 singletons, then a scatter plot would be as helpful here with case and control as variables. That should make clear whether the differing SDs are a side-effect of outliers or even whether they are to be considered a big deal. Often whether the means differ isn't the most important fact for emphasis.

      Comment


      • #4
        Try something like the following. Begin at the "Begin here" comment.
        Code:
        version 15.1
        
        clear *
        
        set seed `=strreverse("1479093")'
        
        quietly set obs 5
        generate byte agesex_grp = _n
        generate double agesex_u = rnormal(0, 2)
        tempfile us
        quietly save `us'
        
        drop _all
        quietly set obs 371
        
        generate byte agesex_grp = runiformint(1, 5)
        generate byte snp_grp = 0
        generate double twodeefourdee = rnormal(0.94, 0.35)
        
        tempfile controls
        quietly save `controls'
        
        drop _all
        quietly set obs 348
        generate byte agesex_grp = runiformint(1, 5)
        generate byte snp_grp = 1
        generate double twodeefourdee = rnormal(0.97, 0.65)
        
        append using `controls'
        
        merge m:1 agesex_grp using `us', assert(match) nogenerate noreport
        quietly replace twodeefourdee = twodeefourdee + agesex_u
        
        *
        * Begin here
        *
        mixed twodeefourdee i.snp_grp || agesex_grp: , ///
            residuals(independent, by(snp_grp)) ///
            nolrtest nolog
        
        // Compare:
        xtreg twodeefourdee i.snp_grp, i(agesex_grp) fe
        
        exit
        I'm guessing that, as Clyde and Nick have already pointed out, you're placing too much emphasis on within-group variances.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Since these are matched pair data you must use the paired t-test; the unpaired ttest is simply not valid for matched pairs. And for the paired ttest, the variances in the two samples are irrelevant, so no need to see if they are equal.

          That said, it isn't possible to have matched pair data with unequal sample sizes. So what is actually going on here?
          Thank you Clyde. The sample sizes are indeed unequal even they are matched. I assume it could be because two samples are matched by age and sex, so some cases are matched with more than one control ... I read the guide again, because I only want to compare one variable between two samples, I can't use paired t-test command like -ttest var1==var2- here. I tried to separate the target variable into two (creating two new variables), but it didn't work.
          Last edited by Yue YY; 17 Jan 2019, 10:00.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Assuming you can explain and remove the 23 singletons, then a scatter plot would be as helpful here with case and control as variables. That should make clear whether the differing SDs are a side-effect of outliers or even whether they are to be considered a big deal. Often whether the means differ isn't the most important fact for emphasis.
            That's a great idea. Thank you very much, Nick.

            Comment


            • #7
              Re #5. I believe that if I saw your data, I could show you how to create the two separate variables for -ttest var1 = var2- using either -reshape- or -separate- commands.

              Be that as it may, if your data contains a single variable, let's call it var, and an indicator for which group (1 or 2) each observation belongs to (call this variable group), and another variable containing the ID of the matched case for each control (and the ID of the case itself for each case), call that one tuple, then you can emulate the paired ttest by running:
              Code:
              regress var i.group i.tuple
              The test for the variable group will be identical to what you would get from changing your data layout to do the paired t-test.

              My point is, that it is not legitimate to use an unpaired t-test for this kind of data: the results are simply invalid, and they aren't even predictably wrong in one direction or the other. They're just wrong and useless. You must use an analysis that respects the matching. One can question whether any kind of ttest is appropriate here: if you believe it is, then you need the paired version and the code above enables you to do it. Exploring other options is good, too, but whatever you do, the analysis absolutely must respect the matching.

              Comment


              • #8
                n
                Originally posted by Joseph Coveney View Post
                Try something like the following. Begin at the "Begin here" comment.
                Code:
                version 15.1
                
                clear *
                
                set seed `=strreverse("1479093")'
                
                quietly set obs 5
                generate byte agesex_grp = _n
                generate double agesex_u = rnormal(0, 2)
                tempfile us
                quietly save `us'
                
                drop _all
                quietly set obs 371
                
                generate byte agesex_grp = runiformint(1, 5)
                generate byte snp_grp = 0
                generate double twodeefourdee = rnormal(0.94, 0.35)
                
                tempfile controls
                quietly save `controls'
                
                drop _all
                quietly set obs 348
                generate byte agesex_grp = runiformint(1, 5)
                generate byte snp_grp = 1
                generate double twodeefourdee = rnormal(0.97, 0.65)
                
                append using `controls'
                
                merge m:1 agesex_grp using `us', assert(match) nogenerate noreport
                quietly replace twodeefourdee = twodeefourdee + agesex_u
                
                *
                * Begin here
                *
                mixed twodeefourdee i.snp_grp || agesex_grp: , ///
                residuals(independent, by(snp_grp)) ///
                nolrtest nolog
                
                // Compare:
                xtreg twodeefourdee i.snp_grp, i(agesex_grp) fe
                
                exit
                I'm guessing that, as Clyde and Nick have already pointed out, you're placing too much emphasis on within-group variances.
                Thank you very much Joseph!

                Comment

                Working...
                X