Multiple hypothesis testing commands

Tharcisio Leone

Join Date: Sep 2019
Posts: 37

Multiple hypothesis testing commands

29 Mar 2022, 12:53

Dear Statalisters,

my database contains 457 teachers (IDprof) in 41 schools (IDescola) in a panel structure with T=5 (year). But the panel data is strongly unbalanced (550 records and 457 unique values).

For each teacher, I have a list of Ordered Categorical Dependent Variables related to teaching practices (q010 q012 q013 q014 q015) and a list of explanatory variables related to the school and student characteristics ($ControlVar).

Below (a part of the) database, in case you wanna try yourself.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(q010 q012 q013 q014 q015 e029 e025 e024 e027 e026 q036 q038 q037 q039 q044 q045 q046) float(DiD time treated) byte grade int year long IDescola
 5 4  2  2 4  4 1 1 3 1  1  1  1  1  1  1  1 0 0 1 3 2007 35913005
 4 4  3  4 3  2 3 1 1 1  1  1  1 .a  2  1 .a 0 0 0 3 2006 35283685
 2 4  4  4 3  2 3 1 1 1  1  1  2  2  3  2  2 0 0 0 3 2007 35283685
 4 5  5  5 5  4 3 1 3 1  1  1  1  1  2  1  1 0 0 0 1 2005 35088614
 3 4  3  3 4  4 3 1 3 1  1  1  1  2  2 .a  2 0 0 0 1 2005 35088614
 5 5  4  5 5  4 3 2 3 1  1  1  1  1  2  1  1 0 0 0 2 2006 35071122
 5 5  5  5 5  4 3 2 3 1  1  1  1  1  1  2  1 0 0 0 1 2005 35071122
 3 4  3  3 3  1 3 1 3 1  3  1  3  2  2  2  2 0 0 0 1 2005 35083860
 5 5  4  5 5  2 3 1 1 1  2  2  3  2  2  3  3 0 1 0 4 2008 35283685
 5 4  5  4 5  3 3 1 3 1 .a .a .a .a .a .a .a 0 0 1 3 2007 35018399
 5 5  4  3 5 .a 3 1 3 1  2  3  3  3  2  3  3 0 0 0 1 2005 35059122
 4 4 .a .a 4  3 1 1 3 1  1  1 .a  1  1  1  1 0 0 1 1 2005 35905446
 5 5  5  4 5  4 3 1 3 3  1  1  1  1  1  2  1 0 0 1 2 2006 35901124
.a 5  5  5 5  2 1 1 3 1 .a .a .a .a .a .a .a 0 0 1 2 2005 35042648
 5 5  5  5 5  3 1 1 3 1  1  1  1  1  1  1  1 1 1 1 4 2008 35018824
 5 4  4  4 4  4 3 1 3 3  1  1  2  1  1  1  1 0 0 0 1 2005 35059237
 5 5  5  5 5  4 3 1 3 3  1  1  1  1  1  1  1 0 0 1 1 2005 35901124
 2 4  3  2 2  1 3 1 3 1  2  2  2 .a  2  3  3 0 0 0 3 2007 35083860
 5 5  5  5 5  4 1 1 3 1  1  1  1  1  1  1  1 0 0 1 2 2006 35924957
 5 4  3  3 3  3 1 1 3 1  1  1  1  1  2  1  1 0 0 1 1 2005 35905446
 5 5  5  5 5  2 3 1 1 1  1  1  1  1  1  3  3 0 0 0 3 2007 35283685
 5 5  1  1 3 .a 3 1 3 1  1  2  1  2  2  3  2 0 1 0 4 2008 35083811
 3 3  3  2 4  4 3 2 3 1  1  1  1  1  1  1  1 0 0 0 1 2005 35071122
 3 3  3  3 3  4 3 2 3 1  3  2  3  2  2  2  2 0 0 0 2 2006 35071122
 4 4  4  4 4  1 3 1 3 1  1  2  2  1  2  2  1 0 0 0 2 2006 35083860
end
label values q010 q001
label values q012 q001
label values q013 q001
label values q014 q001
label values q015 q001
label def q001 2 "Nível 2", modify
label def q001 3 "Nível 3", modify
label def q001 4 "Nível 4", modify
label def q001 5 "Nível 5 (concordo totalmente)", modify
label def q001 .a "[9]Respostas inválidas", modify
label def q001 1 "Nível 1 (discordo totalmente)", modify
label values e029 e029
label def e029 1 "Não tem", modify
label def e029 2 "Quase nunca é usada", modify
label def e029 3 "Usada eventualmente por alunos", modify
label def e029 4 "Há programação regular para uso", modify
label values e024 e023
label values e025 e023
label values e026 e023
label values e027 e023
label def e023 1 "Não Tem", modify
label def e023 2 "Tem, mas não é usado", modify
label def e023 3 "Tem e é usado", modify
label values q036 q036
label values q037 q036
label values q038 q036
label values q039 q036
label values q044 q036
label values q045 q036
label values q046 q036
label def q036 1 "Não impede", modify
label def q036 2 "Em alguma medida", modify
label def q036 3 "Impede muito", modify
label def q036 .a "[9]Respostas inválidas", modify

------------------ copy up to and including the previous line ------------------

Listed 25 out of 550 observations

So, I create a DiD to investigate whether the intervention generated some impact on the teacher practices.

Code:

* For the foreach functions
global Yvar q010 q012 q013 q014 q015 // Teaching practices
global ControlVar e029 e025 e024 e027 e026 q036 q038 q037 q039 q044 q045 q046  // School and student Features

global cluster ", vce(cluster IDescola)"
gen Y = .

* Estimation
eststo clear
foreach outcome in $Yvar  {
replace Y = `outcome'
qui oprobit Y DiD time treated i.grade i.year $ControlVar, cluster(IDescola)
eststo
}

esttab, cells(b(star fmt(3)) p(par fmt(3))) numbers pr2(3) keep(DiD time treated) starl(* 0.1 ** 0.05 *** 0.01) replace


--------------------------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)             (5)  
                        Y               Y               Y               Y               Y  
                      b/p             b/p             b/p             b/p             b/p  
--------------------------------------------------------------------------------------------
Y                                                                                          
DiD                -0.184           0.436*         -0.026           0.062          -0.158  
                  (0.554)         (0.072)         (0.926)         (0.813)         (0.643)  
time               -0.528           0.681           0.420           0.452           0.415  
                  (0.380)         (0.208)         (0.557)         (0.552)         (0.537)  
treated             0.250          -0.159           0.251           0.388           0.497*  
                  (0.320)         (0.473)         (0.346)         (0.152)         (0.070)  
--------------------------------------------------------------------------------------------
N                     430             433             433             436             438  
pseudo R-sq         0.047           0.033           0.027           0.039           0.053  
--------------------------------------------------------------------------------------------

.
end of do-file

Since I am running several regressions, I am afraid that multiple testing could be a problem. So, I would like to correct the p-values by multi-hypothesis testing procedures, such as Bonferroni, FDR or Romano-Wolf.
I found a lot of theoretical foundation related to multiple testing in internet, but only very little information about its implementation using Stata.

What I would like to do is very simple:
After the esttab above, I would like to run the regressions (foreach) again WITH the multi-hypothesis testing procedure and save these "corrected" p-values. Then, I will print esttab again with the coefficients and p-values from the first estimation AND the "corrected" p-values in a new row. (Similar to this example).

The idea is simple, but I could not implement it in Stata. Does anyone have an idea?
Any advice would be highly appreciated!
Many thanks in advance.

Last edited by Tharcisio Leone; 29 Mar 2022, 12:57.

Tags: None

Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#2

30 Mar 2022, 10:53

The problem was solved.
For everyone interested in this issue, please see Clarke, Romano and Wol (2020).
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#3

30 Mar 2022, 10:53

The problem was solved.
For everyone interested in this issue, please see Clarke, Romano and Wolf (2020).
Comment

Announcement

Multiple hypothesis testing commands

Comment

Comment