Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bonferroni correction for multiple t-test

    Hello everyone,

    I want to see if body weight is different between boys and girls according to age groups. In my data, I have 10 age groups. So to see if the mean weight between boys and girls is different from 0, I want to do the t test.

    Since I have multiple groups, I need to take into consideration the issue of multiple testing.

    Is there a single command I can add in my following code for ttest:
    Code:
    by description2, sort : ttest weight, by(gender)
    I can of course do it manually 0.05/10=0.005. However, I want to also see if mean weight differs according to other categorical variables such as BMI, countries, etc. A code will be really helpful and save me a time.

    I will need different bonferroni correction for each ttest depending upon the number of test. Correct?

    PS: I have really corrected data since children were followed up yearly. (If this makes a difference)

  • #2
    You can estimate the differences between means of two groups with a regress statement. The estimated differences are the coefficients of interaction factor variables. Following regress, a test statement with the mtest() option will correct for multiple comparisons. Below, I show how to buildup the test statement for an arbitrary number of group levels. I use a 0-1 definition of the groups that are to be compared (here defined by the auto variable "foreign"). I also use the mtest sidak correction, as it is slightly more powerful than bonferroni.

    You plan corrections not only for age, but for other variables as well. Thus you don't do a simultaneous correction for all the tests you plan. I suggest that you show a histogram or dotplot of all the p-values and simply state that, in the absence of any real differences, you expect 0.05 x (number of tests) to have p<0.05.

    Code:
    sysuse auto, clear
    
    /* Create 3 category variable age group */
    gen agegp = rep78
    recode agegp 4=1 5=2 .=1
    
    /* Assign local macro "cat"  to agep*/
    local cat agegp
    
    /* t-tests */
    bys `cat': ttest headroom, by(foreign) unequal
    
    /*regress */
    reg headroom ibn.`cat' ibn.`cat'#ibn.foreign , nocons vce(robust)
    /* Interaction coefficients = differences between means */
    
    /*Get number of levels for your categorical variable */
    levelsof `cat', local(levels)
    
    /* Construct the arguments for the -test- command */
    foreach i of local levels{
    local testcmd =  "`testcmd'"  + "`i'."  + "`cat'#0.foreign "
    }
    
    /* Whole Test command  will be */
    di "test " `"`testcmd'"' ", mtest(sidak)"
    
    /* Do the tests */
    test `testcmd', mtest(sidak)
    The last part of the results is:
    Code:
     /* Whole Test command  will be */
    . di "test " `"`testcmd'"' ", mtest(sidak)"
    test 1.agegp#0.foreign 2.agegp#0.foreign 3.agegp#0.foreign , mtest(sidak)
    
    .
    . /* Do the tests */
    . test `testcmd', mtest(sidak)
    
     ( 1)  1bn.agegp#0bn.foreign = 0
     ( 2)  2.agegp#0bn.foreign = 0
     ( 3)  3.agegp#0bn.foreign = 0
    
    ---------------------------------------
           |    F(df,68)     df       p
    -------+-------------------------------
      (1)  |        1.86      1     0.4430 #
      (2)  |        1.87      1     0.4395 #
      (3)  |        6.85      1     0.0323 #
    -------+-------------------------------
      all  |        3.53      3     0.0193
    ---------------------------------------
                  # Sidak-adjusted p-values
    Last edited by Steve Samuels; 11 Jul 2015, 21:26.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Although I've given a solution, I ask why you need tests and multiple corrections at all. The corrections control the probability of falsely declaring significance if all of the null hypotheses are true, but most hypotheses of no gender difference in body weight will be false. I think you are better off asking "how different?" and answering with confidence intervals. See Gelman et al., 2012.

      Reference:
      Gelman, Andrew, Jennifer Hill, and Masanao Yajima. 2012. Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness 5, no. 2: 189-211.

      http://www.stat.columbia.edu/~gelman...multiple2f.pdf
      Last edited by Steve Samuels; 12 Jul 2015, 07:16.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Hi! I’m examining how the change in right ventricle systolic function between persons with kidney disease and persons with systemic hypertension is associated with age and other clinical/biochemistry/EKG/Echocardiogram parameters/variables.

        I’m getting about this task with a series of essentially bivariate and no more than 3/4 explanatory variables (including interaction and spline terms from principally two variables) on a model.

        The question I’m asking requires combination of coefficients via the lincom command.

        I’ll appreciate help in correcting for multiple testing of the values from a combination of coefficients to improve robustness of the analysis (I.e. how to apply mtest(Sidak/holm/Bonferroni) to lincom combined coefficient p values). Many thanks

        Comment


        • #5
          Originally posted by Itse Ajuyah View Post
          Hi! I’m examining how the change in right ventricle systolic function between persons with kidney disease and persons with systemic hypertension is associated with age and other clinical/biochemistry/EKG/Echocardiogram parameters/variables.

          I’m getting about this task with a series of essentially bivariate and no more than 3/4 explanatory variables (including interaction and spline terms from principally two variables) on a model.

          The question I’m asking requires combination of coefficients via the lincom command.

          I’ll appreciate help in correcting for multiple testing of the values from a combination of coefficients to improve robustness of the analysis (I.e. how to apply mtest(Sidak/holm/Bonferroni) to lincom combined coefficient p values). Many thanks (I use Stata 12)
          I use Stata 12

          Comment

          Working...
          X