Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple hypothesis testing commands

    Dear Statalisters,

    my database contains 457 teachers (IDprof) in 41 schools (IDescola) in a panel structure with T=5 (year). But the panel data is strongly unbalanced (550 records and 457 unique values).

    For each teacher, I have a list of Ordered Categorical Dependent Variables related to teaching practices (q010 q012 q013 q014 q015) and a list of explanatory variables related to the school and student characteristics ($ControlVar).

    Below (a part of the) database, in case you wanna try yourself.


    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(q010 q012 q013 q014 q015 e029 e025 e024 e027 e026 q036 q038 q037 q039 q044 q045 q046) float(DiD time treated) byte grade int year long IDescola
     5 4  2  2 4  4 1 1 3 1  1  1  1  1  1  1  1 0 0 1 3 2007 35913005
     4 4  3  4 3  2 3 1 1 1  1  1  1 .a  2  1 .a 0 0 0 3 2006 35283685
     2 4  4  4 3  2 3 1 1 1  1  1  2  2  3  2  2 0 0 0 3 2007 35283685
     4 5  5  5 5  4 3 1 3 1  1  1  1  1  2  1  1 0 0 0 1 2005 35088614
     3 4  3  3 4  4 3 1 3 1  1  1  1  2  2 .a  2 0 0 0 1 2005 35088614
     5 5  4  5 5  4 3 2 3 1  1  1  1  1  2  1  1 0 0 0 2 2006 35071122
     5 5  5  5 5  4 3 2 3 1  1  1  1  1  1  2  1 0 0 0 1 2005 35071122
     3 4  3  3 3  1 3 1 3 1  3  1  3  2  2  2  2 0 0 0 1 2005 35083860
     5 5  4  5 5  2 3 1 1 1  2  2  3  2  2  3  3 0 1 0 4 2008 35283685
     5 4  5  4 5  3 3 1 3 1 .a .a .a .a .a .a .a 0 0 1 3 2007 35018399
     5 5  4  3 5 .a 3 1 3 1  2  3  3  3  2  3  3 0 0 0 1 2005 35059122
     4 4 .a .a 4  3 1 1 3 1  1  1 .a  1  1  1  1 0 0 1 1 2005 35905446
     5 5  5  4 5  4 3 1 3 3  1  1  1  1  1  2  1 0 0 1 2 2006 35901124
    .a 5  5  5 5  2 1 1 3 1 .a .a .a .a .a .a .a 0 0 1 2 2005 35042648
     5 5  5  5 5  3 1 1 3 1  1  1  1  1  1  1  1 1 1 1 4 2008 35018824
     5 4  4  4 4  4 3 1 3 3  1  1  2  1  1  1  1 0 0 0 1 2005 35059237
     5 5  5  5 5  4 3 1 3 3  1  1  1  1  1  1  1 0 0 1 1 2005 35901124
     2 4  3  2 2  1 3 1 3 1  2  2  2 .a  2  3  3 0 0 0 3 2007 35083860
     5 5  5  5 5  4 1 1 3 1  1  1  1  1  1  1  1 0 0 1 2 2006 35924957
     5 4  3  3 3  3 1 1 3 1  1  1  1  1  2  1  1 0 0 1 1 2005 35905446
     5 5  5  5 5  2 3 1 1 1  1  1  1  1  1  3  3 0 0 0 3 2007 35283685
     5 5  1  1 3 .a 3 1 3 1  1  2  1  2  2  3  2 0 1 0 4 2008 35083811
     3 3  3  2 4  4 3 2 3 1  1  1  1  1  1  1  1 0 0 0 1 2005 35071122
     3 3  3  3 3  4 3 2 3 1  3  2  3  2  2  2  2 0 0 0 2 2006 35071122
     4 4  4  4 4  1 3 1 3 1  1  2  2  1  2  2  1 0 0 0 2 2006 35083860
    end
    label values q010 q001
    label values q012 q001
    label values q013 q001
    label values q014 q001
    label values q015 q001
    label def q001 2 "Nível 2", modify
    label def q001 3 "Nível 3", modify
    label def q001 4 "Nível 4", modify
    label def q001 5 "Nível 5 (concordo totalmente)", modify
    label def q001 .a "[9]Respostas inválidas", modify
    label def q001 1 "Nível 1 (discordo totalmente)", modify
    label values e029 e029
    label def e029 1 "Não tem", modify
    label def e029 2 "Quase nunca é usada", modify
    label def e029 3 "Usada eventualmente por alunos", modify
    label def e029 4 "Há programação regular para uso", modify
    label values e024 e023
    label values e025 e023
    label values e026 e023
    label values e027 e023
    label def e023 1 "Não Tem", modify
    label def e023 2 "Tem, mas não é usado", modify
    label def e023 3 "Tem e é usado", modify
    label values q036 q036
    label values q037 q036
    label values q038 q036
    label values q039 q036
    label values q044 q036
    label values q045 q036
    label values q046 q036
    label def q036 1 "Não impede", modify
    label def q036 2 "Em alguma medida", modify
    label def q036 3 "Impede muito", modify
    label def q036 .a "[9]Respostas inválidas", modify
    ------------------ copy up to and including the previous line ------------------

    Listed 25 out of 550 observations


    So, I create a DiD to investigate whether the intervention generated some impact on the teacher practices.

    Code:
    * For the foreach functions
    global Yvar q010 q012 q013 q014 q015 // Teaching practices
    global ControlVar e029 e025 e024 e027 e026 q036 q038 q037 q039 q044 q045 q046  // School and student Features
    
    global cluster ", vce(cluster IDescola)"
    gen Y = .
    
    * Estimation
    eststo clear
    foreach outcome in $Yvar  {
    replace Y = `outcome'
    qui oprobit Y DiD time treated i.grade i.year $ControlVar, cluster(IDescola)
    eststo
    }
    
    esttab, cells(b(star fmt(3)) p(par fmt(3))) numbers pr2(3) keep(DiD time treated) starl(* 0.1 ** 0.05 *** 0.01) replace
    
    
    --------------------------------------------------------------------------------------------
                          (1)             (2)             (3)             (4)             (5)  
                            Y               Y               Y               Y               Y  
                          b/p             b/p             b/p             b/p             b/p  
    --------------------------------------------------------------------------------------------
    Y                                                                                          
    DiD                -0.184           0.436*         -0.026           0.062          -0.158  
                      (0.554)         (0.072)         (0.926)         (0.813)         (0.643)  
    time               -0.528           0.681           0.420           0.452           0.415  
                      (0.380)         (0.208)         (0.557)         (0.552)         (0.537)  
    treated             0.250          -0.159           0.251           0.388           0.497*  
                      (0.320)         (0.473)         (0.346)         (0.152)         (0.070)  
    --------------------------------------------------------------------------------------------
    N                     430             433             433             436             438  
    pseudo R-sq         0.047           0.033           0.027           0.039           0.053  
    --------------------------------------------------------------------------------------------
    
    .
    end of do-file


    Since I am running several regressions, I am afraid that multiple testing could be a problem. So, I would like to correct the p-values by multi-hypothesis testing procedures, such as Bonferroni, FDR or Romano-Wolf.
    I found a lot of theoretical foundation related to multiple testing in internet, but only very little information about its implementation using Stata.

    What I would like to do is very simple:
    After the esttab above, I would like to run the regressions (foreach) again WITH the multi-hypothesis testing procedure and save these "corrected" p-values. Then, I will print esttab again with the coefficients and p-values from the first estimation AND the "corrected" p-values in a new row. (Similar to this example).

    The idea is simple, but I could not implement it in Stata. Does anyone have an idea?
    Any advice would be highly appreciated!
    Many thanks in advance.











    Last edited by Tharcisio Leone; 29 Mar 2022, 12:57.

  • #2
    The problem was solved.
    For everyone interested in this issue, please see Clarke, Romano and Wol (2020).

    Comment


    • #3
      The problem was solved.
      For everyone interested in this issue, please see Clarke, Romano and Wolf (2020).

      Comment

      Working...
      X