Multiple One-Sample-T-Tests?

Tim Sandmann

Join Date: Oct 2015

Posts: 14
#1

Multiple One-Sample-T-Tests?

15 Aug 2016, 08:43

Dear Statalist members,

I want to test if there is a significant difference between the mean of a sample and the mean of five subgroups of the sample and want to perform these tests for over 30 variables . What is the most efficient way?

Background info: I want to characterize the five groups and for the characterization I just want to know, which subgroup mean is significant below or above the average of the sample.

Thanks!
Tags: None
Tim Sandmann

Join Date: Oct 2015

Posts: 14
#2

15 Aug 2016, 12:43

I watched a few tutorials I think this will work:

foreach var of varlist hinwei warm streng {
di "'var'"
sum 'var'
local mtypo = r(mean)
foreach num of numlist 2 4 5 6 8 {
di 'num'
ttest 'var'=='mtypo' if typo=='num'
}
}

hinwei, warm & streng are the first three variable, I want to analyze.
typo is the number of the subgroup. There are eight subgoups, but I only want to analyze 2 4 5 6 8.
The di commands show me the variable and the subgroup that is currently in the loop.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#3

15 Aug 2016, 13:37

Well, that code will not run because you have used the wrong quotation marks to attempt to reference your macros. The opening quote has to be `, not '. But even when you fix that, the results will be wrong.

For two reasons.

1. By using the mean of `var' as a fixed constant against which to test the subset mean, you are ignoring the sampling error inherent in that mean calculation. So your pooled standard errors will be too small, your t-statistics too large, and your p-values falsely low. This problem might be overlooked if the sample size is sufficiently large that sampling error of the overall mean is, for practical purposes, zero, or if your sample is, in fact, a census.

2. But even if #1 is forgivable, this is not: the data from which you calculate the subset statistics are not independent of the data from which the overall mean was calculated. Consequently your pooled standard errors will be misleading, and the direction of the error is not predictable. You simply can't use a t-test to compare data that are not independent of each other and get interpretable results.

Here's an alternative approach. Logically, the population mean of a subtype will equal the overall population mean if and only if the population mean of the subtype is equal to the population mean of the complementary subtype. So you can test that: the data in a subset and the complementary subset are going to be independent, so problem #2 is overcome, and none of the statistics are being calculated ignoring sampling error, so #1 is also overcome.

Code:

assert !missing(typo) gen byte indicator = . foreach v of varlist hinwei warm streng { display `"`var'"' foreach n of numlist 2 4 5 6 8 { display `n' replace indicator = `n'.typo ttest `v', by(indicator) } }

Now, there is still one problem: you are doing multiple hypothesis tests here. You may want to perform some kind of multiple comparison adjustment to your p-values , or you need to warn your audience that the p-values have not been adjusted for multiple comparisons.
Comment
Tim Sandmann

Join Date: Oct 2015

Posts: 14
#4

16 Aug 2016, 05:09

Thank you for your answer!.

Wrong Marks: The quotation marks are right in my code. This was just a copy and paste problem.

Problems with standard errors: I thought I can take the mean of my sample as estimator for the population mean and can compare the population mean with the means of my subsamples via one sample t-test. I understand your point. This is not the right strategy. I think line 1 of your code has an error. What is the function of this line? I already excluded all missings.

Multiple Testing: I will use the results of the ttests only as a heuristic to color some of the subgroup means in my tables for better interpretation. I already included a warning for the audience

Last edited by Tim Sandmann; 16 Aug 2016, 05:19.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#5

16 Aug 2016, 09:13

I think line 1 of your code has an error. What is the function of this line? I already excluded all missings.

The first line serves to verify that typo is never missing. I had no way to know whether you had already excluded missing values of typo, and I was concerned that my code would break if there were any. So I put that in as a check.
Comment
Tim Sandmann

Join Date: Oct 2015

Posts: 14
#6

16 Aug 2016, 09:50

Thanks again. This was very helpful. Okay, now I understand the "error" message I get, when I run line 1 before I excluded the missings. So this code should be working too (WITH missings on typo):

Code:

gen group = . foreach var of varlist hinwei warm streng { display "`var'" foreach num of numlist 2 4 5 6 8 { display `num' replace group = 1 if `num'==typo replace group = 0 if `num'!=typo & typo!=. ttest `var' if v68==1, by(group) } }

v68==1 -> people who completed the questionnaire.

Last edited by Tim Sandmann; 16 Aug 2016, 09:53.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#7

16 Aug 2016, 13:11

Yes.
Comment

Announcement

Multiple One-Sample-T-Tests?

Comment

Comment

Comment

Comment

Comment

Comment