Assign pseudo treatment year and repeat baseline regression 5000 times and get t-distribution

Lin Zhang

Join Date: Aug 2022
Posts: 5

Assign pseudo treatment year and repeat baseline regression 5000 times and get t-distribution

25 Nov 2024, 19:37

Dear Statalist Community,

I am conducting research on the impact of an external shock (treatment) on firm characteristics (outcome). At this stage, I need to perform a placebo test to assess the baseline results. Specifically, the following steps need to be taken:

1. retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years (without replacement);

2. regress the baseline model using the pseudo treatment variable and retain its coefficient and standard error;

3. compute t-stats;

4. repeat this procedure 5000 times and yield a distribution of placebo t-stat estimates, then plot the histogram with a density curve.

Below is a sample of my data and code, but STATA returned an error message: "invalid syntax; an error occurred when simulate executed myplacebo; r(198)." Beside, can the above mentioned steps be achieved using these commands? I would greatly appreciate it if you could review the code and offer any suggestions. Thank you in advance for your assistance.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(gvkey fyear) str8 incorp float(treat treatyear outcome)
1004 1991 "DE" 0 2000  1.9301835
1004 1992 "DE" 0 2000   2.403937
1004 1993 "DE" 0 2000  1.8439943
1004 1994 "DE" 0 2000  1.8381265
1004 1995 "DE" 0 2000  2.1075902
1004 1996 "DE" 0 2000  3.0318136
1004 1997 "DE" 0 2000   1.359761
1004 1998 "DE" 0 2000   2.330347
1004 1999 "DE" 0 2000   2.665054
1004 2000 "DE" 1 2000  1.3347505
1004 2001 "DE" 1 2000    .652634
1004 2002 "DE" 1 2000  .11607568
1004 2003 "DE" 1 2000          0
1004 2004 "DE" 1 2000          0
1004 2005 "DE" 1 2000          0
1004 2006 "DE" 1 2000          0
1004 2007 "DE" 1 2000   .6994809
1004 2008 "DE" 1 2000          0
1004 2009 "DE" 1 2000          0
1004 2010 "DE" 1 2000    .324113
1004 2011 "DE" 1 2000    .716871
1004 2012 "DE" 1 2000  1.2401142
1004 2013 "DE" 1 2000   .5819505
1004 2014 "DE" 1 2000   10.78548
1004 2015 "DE" 1 2000  2.0248249
1004 2016 "DE" 1 2000  1.9945482
1004 2017 "DE" 1 2000   1.534728
1004 2018 "DE" 1 2000  1.3709465
1004 2019 "DE" 1 2000   .7118807
1004 2020 "DE" 1 2000 .006494772
1009 1991 "DE" 0 2000    2.27228
1009 1992 "DE" 0 2000  1.9249095
1009 1993 "DE" 0 2000   1.578199
1009 1994 "DE" 0 2000  1.0766329
1010 1991 "NJ" 0 2011  1.3870368
1010 1992 "NJ" 0 2011          0
1010 1993 "NJ" 0 2011   7.370338
1010 1994 "NJ" 0 2011   .4462564
1010 1995 "NJ" 0 2011          0
1010 1996 "NJ" 0 2011          0
1010 1997 "NJ" 0 2011  -.3528432
1010 1998 "NJ" 0 2011          0
1010 1999 "NJ" 0 2011          0
1010 2000 "NJ" 0 2011          0
1010 2001 "NJ" 0 2011          .
1010 2002 "NJ" 0 2011          .
1010 2003 "NJ" 0 2011  1.4550402
1011 1991 "PA" 0    . -.02293578
1011 1992 "PA" 0    .          0
1011 1993 "PA" 0    . -2.7942355
end

Code:

mkmat treatyear if treat, matrix(T)
     local ntreat = rowsof(T)
cap program drop myplacebo
program myplacebo, rclass
     drop _all
     set seed 347544
     gen pseudo_year = T[runiformint(1,`ntreat'), 1] if !treat
     replace pseudo_year = treatyear if treat
     replace pseudo_year = . if !missing(treatyear)
     gen pseudo_treated = 0
     replace pseudo_treated = 1 if fyear >= pseudo_year
    
     reg outcome pseudo_treated i.fyear, cluster(gvkey)
     return scalar rb0 = _b[pseudo_treated]
     return scalae rse0 = _se[pseudo_treated]
end

simulate rb0 =r(rb0) rse0=r(rse0), reps(5000) : myplacebo
gen tstat = rb0/rse0
histogram tstat, normal width(0.4) xline(9.01) ytitle("Density") xtitle("t-statistics of the placebo estimates") title(Randomize only states)

The final product is expected to be look like this figure:

Last edited by Lin Zhang; 25 Nov 2024, 19:42.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

25 Nov 2024, 23:21

The error message you are getting arises from the command

Code:

gen pseudo_year = T[runiformint(1,`ntreat'), 1] if !treat

in program myplacebo. The syntax error is due to the fact that local macro ntreat is undefined, so -runiformint(1, `ntreat')- expands as -runiformint(1,)-, which is, of course, illegal syntax.

Now, you may say: wait, what? local macro ntreat was defined right after the very first command (-mkmat-) shown. Yes, it was. But local macros are, well, local. Their scope only extends to the program that defined them, which, in your case was the top level of the do-file. Program myplacebo is a different program and it does not recognize local macros that were defined outside of it. Similarly any local macros defined inside of program myplacebo will not be valid back in the top level of the do-file (nor in any program called from within program myplacebo, should you write one.)

So you have to get the information about ntreat into program myplacebo. There are a few ways you can do that. One is to make it an argument to program myplacebo and then pick it up with an -args- command at the beginning of myplacebo. But in this case, I think a better approach is not to use ntreat there at all, and to instead use

Code:

gen pseudo_year = T[runiformint(1,`:rowsof T'), 1] if !treat

instead.

However, when you do fix that error, it unmasks another error you have made. You begin program myplacebo with a -drop _all- command. Then you try to use the variable treat (in the -if !treat- clause. But having -drop-ped all your variables, there is no longer a variable treat to refer to. I don't know what the fix for this is because I do not understand what you are trying to do anyway. But perhaps you can figure that out. Suffice it to say, once you drop all your variables, you can't then try to work with them.

Finally, there is another logic error in your code that will not give you an error message but will destroy the validity of your attempted simulation. You must not -set seed- inside program myplacebo. By doing that, you are restarting with the same random numbers on each iteration of the simulation. So you will not get 5000 replications of the simulation: you will get 5000 copies of a single replication. You should set the random number generator seed once and only once. You can do that before you call -simulate-, or you can do it using -simulate-'s -seed()- option. But don't do it inside the program that -simulate- calls.

So those are the things that jump out at me. There may well be other errors that you will discover after you fix those things.

If you need additional assistance, please provide a clearer explanation of what you are trying to do. In particular, I do not understand what "retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years (without replacement);" means. I understand the concept of placebo testing, but I do not know how it relates to that sentence.
1 like
Comment
Lin Zhang

Join Date: Aug 2022

Posts: 5
#3

26 Nov 2024, 01:41

Dear Clyde,

Thank you very much for carefully reviewing the code and pointing out the errors. I've replaced the original line with your code and will move the "seeds" outside the program.

Originally posted by Clyde Schechter View Post

If you need additional assistance, please provide a clearer explanation of what you are trying to do. In particular, I do not understand what "retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years (without replacement);" means. I understand the concept of placebo testing, but I do not know how it relates to that sentence.

The research is based on a DID approach. I've run the baseline model using the actual treatment timing and the actual states that received the treatment. "Retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years" means that we want to keep the actual treatment years, but randomly assign non-treated states to those years. For example, DE was treated in 2000, and NV was treated in 2002; in the pseudo scenario, states OH and NJ, which were not treated, would be assigned to the treatment years 2000 and 2002. Then we re-run the regress. If our results are driven by confounding events that occurred about the same time as the treatment years, we should still observe significant results after the random assignment (we hope it doesn't happen), and vice versa.

Regarding the "without treatment", is is from the paper I referenced. I don't fully understand the meaning behind it to be honest, but I assume it means the same state-treament year combinition does not occure twice?

It is like we treat a pair of treat/control groups of patients with tablets and placebo, then there is a second pair of treat/control groups of patients receiving the same tablets and placebo, then there is a third pair…we collect the performance of 5000 pairs and then observe from the whole picture. I hope this explanation clarifies things. Perhaps there is more direct way to achieve this?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

26 Nov 2024, 08:40

So, with an example data set that only contains 4 gvkeys, it is difficult to really develop and test this. The sample is too small for the use of clustered vce's, and there aren't very many distinct permutations of the treatment years. But I think the following code does what you need:

Code:

frame put gvkey treatyear, into(treatment_years)
frame treatment_years {
    by gvkey (treatyear), sort: assert treatyear[1] == treatyear[_N]
    by gvkey (treatyear): keep if _n == 1
    drop gvkey
    gen `c(obs_t)' link = _n
    local ntreated = _N
}

//    ASSIGN A LINKTO EACH GVKEY
sort gvkey fyear
gen `c(obs_t)' link = sum(gvkey != gvkey[_n-1])
frlink m:1 link, frame(treatment_years)

capture program drop one_rep
program define one_rep, rclass
    tempvar pseudo_treatyear
    
    //    RANDOMIZE THE ORDER OF THE TREATMENT YEAR FRAME
    frame treatment_years {
        tempvar shuffle
        gen double `shuffle' = runiform()
        sort `shuffle'
        replace link = _n
    }
    
    //    LINK TO TREATMENT YEAR FRAME AND BRING IN PSEUDO-ASSIGNMENTS
    frlink rebuild treatment_years
    frget `pseudo_treatyear' = treatyear, from(treatment_years)
    tempvar pseudo_treat
    gen byte `pseudo_treat' = fyear >= `pseudo_treatyear'
    
    //    RUN THE REGRESSION AND RETURN RESULTS
    regress outcome i.`pseudo_treat' i.fyear, cluster(gvkey)
    return scalar rb0 = _b[1.`pseudo_treat']
    return scalar rse0 = _se[1.`pseudo_treat']
    set trace off
    exit
end

tempfile results
simulate rb0 =r(rb0) rse0=r(rse0), reps(5000) seed(347544) saving(`results'): ///
    one_rep
use `results', clear
gen tstat = rb0/rse0
histogram tstat // etc.

Now, this code accords with my understanding of a placebo test. It does not match your explanation as I understand it. Specifically, as I understand a placebo test, it involves randomly reassigning treatment years (and corresponding treat status) to all of the gvkeys, not just the actually untreated ones. And that's what this code does.

By the way, FWIW, I don't think your regression is correct either. You really need a two-way fixed effects model here. So I would, before calling -simulate-, -xtset gvkey- and I would change the regression command to -xtreg outcome i.`pseudo_treat' i.fyear, fe cluster(gvkey)-. But I'll leave that to you to decide.

Last edited by Clyde Schechter; 26 Nov 2024, 08:50.

Comment

George Ford

Join Date: Aug 2014
Posts: 3152

26 Nov 2024, 10:15

You might try ritest.

The trick here is that the number of pseudo-assignments is limited by the count of untreated states.

Code:

clear all
set obs 30
g id = _n
g fe = rchi2(1)
expand 15 
bys id: g year = _n+1999
g yfex = rchi2(1)
egen yfe = mean(yfex), by(year)
g treat = id==1
g post = year>=2011
g did = treat*post
g y = fe + yfe + did*2 + rnormal()

reghdfe y c.treat#c.post, absorb(id year) cluster(id)
ritest treat _b[c.treat#c.post], cluster(id): reghdfe y c.treat#c.post, absorb(id year) cluster(id)
ritest treat _b[c.treat#c.post]/_se[c.treat#c.post], cluster(id): reghdfe y c.treat#c.post, absorb(id year) cluster(id)

Announcement

Assign pseudo treatment year and repeat baseline regression 5000 times and get t-distribution

Comment

Comment

Comment

Comment