Bonferroni correction for multiple t-test

Delphina Gomes

Join Date: Nov 2014

Posts: 35
#1

Bonferroni correction for multiple t-test

11 Jul 2015, 11:58

Hello everyone,

I want to see if body weight is different between boys and girls according to age groups. In my data, I have 10 age groups. So to see if the mean weight between boys and girls is different from 0, I want to do the t test.

Since I have multiple groups, I need to take into consideration the issue of multiple testing.

Is there a single command I can add in my following code for ttest:

Code:

by description2, sort : ttest weight, by(gender)

I can of course do it manually 0.05/10=0.005. However, I want to also see if mean weight differs according to other categorical variables such as BMI, countries, etc. A code will be really helpful and save me a time.

I will need different bonferroni correction for each ttest depending upon the number of test. Correct?

PS: I have really corrected data since children were followed up yearly. (If this makes a difference)
Tags: Bonferroni, ttest

Steve Samuels

Join Date: Mar 2014
Posts: 1786

11 Jul 2015, 21:18

You can estimate the differences between means of two groups with a regress statement. The estimated differences are the coefficients of interaction factor variables. Following regress, a test statement with the mtest() option will correct for multiple comparisons. Below, I show how to buildup the test statement for an arbitrary number of group levels. I use a 0-1 definition of the groups that are to be compared (here defined by the auto variable "foreign"). I also use the mtest sidak correction, as it is slightly more powerful than bonferroni.

You plan corrections not only for age, but for other variables as well. Thus you don't do a simultaneous correction for all the tests you plan. I suggest that you show a histogram or dotplot of all the p-values and simply state that, in the absence of any real differences, you expect 0.05 x (number of tests) to have p<0.05.

Code:

sysuse auto, clear

/* Create 3 category variable age group */
gen agegp = rep78
recode agegp 4=1 5=2 .=1

/* Assign local macro "cat"  to agep*/
local cat agegp

/* t-tests */
bys `cat': ttest headroom, by(foreign) unequal

/*regress */
reg headroom ibn.`cat' ibn.`cat'#ibn.foreign , nocons vce(robust)
/* Interaction coefficients = differences between means */

/*Get number of levels for your categorical variable */
levelsof `cat', local(levels)

/* Construct the arguments for the -test- command */
foreach i of local levels{
local testcmd =  "`testcmd'"  + "`i'."  + "`cat'#0.foreign "
}

/* Whole Test command  will be */
di "test " `"`testcmd'"' ", mtest(sidak)"

/* Do the tests */
test `testcmd', mtest(sidak)

The last part of the results is:

Code:

 /* Whole Test command  will be */
. di "test " `"`testcmd'"' ", mtest(sidak)"
test 1.agegp#0.foreign 2.agegp#0.foreign 3.agegp#0.foreign , mtest(sidak)

.
. /* Do the tests */
. test `testcmd', mtest(sidak)

 ( 1)  1bn.agegp#0bn.foreign = 0
 ( 2)  2.agegp#0bn.foreign = 0
 ( 3)  3.agegp#0bn.foreign = 0

---------------------------------------
       |    F(df,68)     df       p
-------+-------------------------------
  (1)  |        1.86      1     0.4430 #
  (2)  |        1.87      1     0.4395 #
  (3)  |        6.85      1     0.0323 #
-------+-------------------------------
  all  |        3.53      3     0.0193
---------------------------------------
              # Sidak-adjusted p-values

Last edited by Steve Samuels; 11 Jul 2015, 21:26.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

12 Jul 2015, 07:11

Although I've given a solution, I ask why you need tests and multiple corrections at all. The corrections control the probability of falsely declaring significance if all of the null hypotheses are true, but most hypotheses of no gender difference in body weight will be false. I think you are better off asking "how different?" and answering with confidence intervals. See Gelman et al., 2012.

Reference:
Gelman, Andrew, Jennifer Hill, and Masanao Yajima. 2012. Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness 5, no. 2: 189-211.

http://www.stat.columbia.edu/~gelman...multiple2f.pdf

Last edited by Steve Samuels; 12 Jul 2015, 07:16.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Itse Ajuyah

Join Date: Dec 2019

Posts: 3
#4

18 Dec 2019, 14:52

Hi! I’m examining how the change in right ventricle systolic function between persons with kidney disease and persons with systemic hypertension is associated with age and other clinical/biochemistry/EKG/Echocardiogram parameters/variables.

I’m getting about this task with a series of essentially bivariate and no more than 3/4 explanatory variables (including interaction and spline terms from principally two variables) on a model.

The question I’m asking requires combination of coefficients via the lincom command.

I’ll appreciate help in correcting for multiple testing of the values from a combination of coefficients to improve robustness of the analysis (I.e. how to apply mtest(Sidak/holm/Bonferroni) to lincom combined coefficient p values). Many thanks
Comment
Itse Ajuyah

Join Date: Dec 2019

Posts: 3
#5

18 Dec 2019, 15:03

Originally posted by Itse Ajuyah View Post

Hi! I’m examining how the change in right ventricle systolic function between persons with kidney disease and persons with systemic hypertension is associated with age and other clinical/biochemistry/EKG/Echocardiogram parameters/variables.

I’m getting about this task with a series of essentially bivariate and no more than 3/4 explanatory variables (including interaction and spline terms from principally two variables) on a model.

The question I’m asking requires combination of coefficients via the lincom command.

I’ll appreciate help in correcting for multiple testing of the values from a combination of coefficients to improve robustness of the analysis (I.e. how to apply mtest(Sidak/holm/Bonferroni) to lincom combined coefficient p values). Many thanks (I use Stata 12)

I use Stata 12
Comment

Announcement