Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assign pseudo treatment year and repeat baseline regression 5000 times and get t-distribution

    Dear Statalist Community,

    I am conducting research on the impact of an external shock (treatment) on firm characteristics (outcome). At this stage, I need to perform a placebo test to assess the baseline results. Specifically, the following steps need to be taken:

    1. retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years (without replacement);

    2. regress the baseline model using the pseudo treatment variable and retain its coefficient and standard error;

    3. compute t-stats;

    4. repeat this procedure 5000 times and yield a distribution of placebo t-stat estimates, then plot the histogram with a density curve.

    Below is a sample of my data and code, but STATA returned an error message: "invalid syntax; an error occurred when simulate executed myplacebo; r(198)." Beside, can the above mentioned steps be achieved using these commands? I would greatly appreciate it if you could review the code and offer any suggestions. Thank you in advance for your assistance.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(gvkey fyear) str8 incorp float(treat treatyear outcome)
    1004 1991 "DE" 0 2000  1.9301835
    1004 1992 "DE" 0 2000   2.403937
    1004 1993 "DE" 0 2000  1.8439943
    1004 1994 "DE" 0 2000  1.8381265
    1004 1995 "DE" 0 2000  2.1075902
    1004 1996 "DE" 0 2000  3.0318136
    1004 1997 "DE" 0 2000   1.359761
    1004 1998 "DE" 0 2000   2.330347
    1004 1999 "DE" 0 2000   2.665054
    1004 2000 "DE" 1 2000  1.3347505
    1004 2001 "DE" 1 2000    .652634
    1004 2002 "DE" 1 2000  .11607568
    1004 2003 "DE" 1 2000          0
    1004 2004 "DE" 1 2000          0
    1004 2005 "DE" 1 2000          0
    1004 2006 "DE" 1 2000          0
    1004 2007 "DE" 1 2000   .6994809
    1004 2008 "DE" 1 2000          0
    1004 2009 "DE" 1 2000          0
    1004 2010 "DE" 1 2000    .324113
    1004 2011 "DE" 1 2000    .716871
    1004 2012 "DE" 1 2000  1.2401142
    1004 2013 "DE" 1 2000   .5819505
    1004 2014 "DE" 1 2000   10.78548
    1004 2015 "DE" 1 2000  2.0248249
    1004 2016 "DE" 1 2000  1.9945482
    1004 2017 "DE" 1 2000   1.534728
    1004 2018 "DE" 1 2000  1.3709465
    1004 2019 "DE" 1 2000   .7118807
    1004 2020 "DE" 1 2000 .006494772
    1009 1991 "DE" 0 2000    2.27228
    1009 1992 "DE" 0 2000  1.9249095
    1009 1993 "DE" 0 2000   1.578199
    1009 1994 "DE" 0 2000  1.0766329
    1010 1991 "NJ" 0 2011  1.3870368
    1010 1992 "NJ" 0 2011          0
    1010 1993 "NJ" 0 2011   7.370338
    1010 1994 "NJ" 0 2011   .4462564
    1010 1995 "NJ" 0 2011          0
    1010 1996 "NJ" 0 2011          0
    1010 1997 "NJ" 0 2011  -.3528432
    1010 1998 "NJ" 0 2011          0
    1010 1999 "NJ" 0 2011          0
    1010 2000 "NJ" 0 2011          0
    1010 2001 "NJ" 0 2011          .
    1010 2002 "NJ" 0 2011          .
    1010 2003 "NJ" 0 2011  1.4550402
    1011 1991 "PA" 0    . -.02293578
    1011 1992 "PA" 0    .          0
    1011 1993 "PA" 0    . -2.7942355
    end
    Code:
    mkmat treatyear if treat, matrix(T)
         local ntreat = rowsof(T)
    cap program drop myplacebo
    program myplacebo, rclass
         drop _all
         set seed 347544
         gen pseudo_year = T[runiformint(1,`ntreat'), 1] if !treat
         replace pseudo_year = treatyear if treat
         replace pseudo_year = . if !missing(treatyear)
         gen pseudo_treated = 0
         replace pseudo_treated = 1 if fyear >= pseudo_year
        
         reg outcome pseudo_treated i.fyear, cluster(gvkey)
         return scalar rb0 = _b[pseudo_treated]
         return scalae rse0 = _se[pseudo_treated]
    end
    
    simulate rb0 =r(rb0) rse0=r(rse0), reps(5000) : myplacebo
    gen tstat = rb0/rse0
    histogram tstat, normal width(0.4) xline(9.01) ytitle("Density") xtitle("t-statistics of the placebo estimates") title(Randomize only states)
    The final product is expected to be look like this figure:
    Last edited by Lin Zhang; 25 Nov 2024, 20:42.

  • #2
    The error message you are getting arises from the command
    Code:
         gen pseudo_year = T[runiformint(1,`ntreat'), 1] if !treat
    in program myplacebo. The syntax error is due to the fact that local macro ntreat is undefined, so -runiformint(1, `ntreat')- expands as -runiformint(1,)-, which is, of course, illegal syntax.

    Now, you may say: wait, what? local macro ntreat was defined right after the very first command (-mkmat-) shown. Yes, it was. But local macros are, well, local. Their scope only extends to the program that defined them, which, in your case was the top level of the do-file. Program myplacebo is a different program and it does not recognize local macros that were defined outside of it. Similarly any local macros defined inside of program myplacebo will not be valid back in the top level of the do-file (nor in any program called from within program myplacebo, should you write one.)

    So you have to get the information about ntreat into program myplacebo. There are a few ways you can do that. One is to make it an argument to program myplacebo and then pick it up with an -args- command at the beginning of myplacebo. But in this case, I think a better approach is not to use ntreat there at all, and to instead use
    Code:
         gen pseudo_year = T[runiformint(1,`:rowsof T'), 1] if !treat
    instead.

    However, when you do fix that error, it unmasks another error you have made. You begin program myplacebo with a -drop _all- command. Then you try to use the variable treat (in the -if !treat- clause. But having -drop-ped all your variables, there is no longer a variable treat to refer to. I don't know what the fix for this is because I do not understand what you are trying to do anyway. But perhaps you can figure that out. Suffice it to say, once you drop all your variables, you can't then try to work with them.

    Finally, there is another logic error in your code that will not give you an error message but will destroy the validity of your attempted simulation. You must not -set seed- inside program myplacebo. By doing that, you are restarting with the same random numbers on each iteration of the simulation. So you will not get 5000 replications of the simulation: you will get 5000 copies of a single replication. You should set the random number generator seed once and only once. You can do that before you call -simulate-, or you can do it using -simulate-'s -seed()- option. But don't do it inside the program that -simulate- calls.

    So those are the things that jump out at me. There may well be other errors that you will discover after you fix those things.

    If you need additional assistance, please provide a clearer explanation of what you are trying to do. In particular, I do not understand what "retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years (without replacement);" means. I understand the concept of placebo testing, but I do not know how it relates to that sentence.

    Comment


    • #3
      Dear Clyde,

      Thank you very much for carefully reviewing the code and pointing out the errors. I've replaced the original line with your code and will move the "seeds" outside the program.
      Originally posted by Clyde Schechter View Post
      If you need additional assistance, please provide a clearer explanation of what you are trying to do. In particular, I do not understand what "retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years (without replacement);" means. I understand the concept of placebo testing, but I do not know how it relates to that sentence.
      The research is based on a DID approach. I've run the baseline model using the actual treatment timing and the actual states that received the treatment. "Retain the distribution of treatment years but randomly assign a never-treated state to each of the passage years" means that we want to keep the actual treatment years, but randomly assign non-treated states to those years. For example, DE was treated in 2000, and NV was treated in 2002; in the pseudo scenario, states OH and NJ, which were not treated, would be assigned to the treatment years 2000 and 2002. Then we re-run the regress. If our results are driven by confounding events that occurred about the same time as the treatment years, we should still observe significant results after the random assignment (we hope it doesn't happen), and vice versa.

      Regarding the "without treatment", is is from the paper I referenced. I don't fully understand the meaning behind it to be honest, but I assume it means the same state-treament year combinition does not occure twice?

      It is like we treat a pair of treat/control groups of patients with tablets and placebo, then there is a second pair of treat/control groups of patients receiving the same tablets and placebo, then there is a third pair…we collect the performance of 5000 pairs and then observe from the whole picture. I hope this explanation clarifies things. Perhaps there is more direct way to achieve this?

      Comment


      • #4
        So, with an example data set that only contains 4 gvkeys, it is difficult to really develop and test this. The sample is too small for the use of clustered vce's, and there aren't very many distinct permutations of the treatment years. But I think the following code does what you need:

        Code:
        frame put gvkey treatyear, into(treatment_years)
        frame treatment_years {
            by gvkey (treatyear), sort: assert treatyear[1] == treatyear[_N]
            by gvkey (treatyear): keep if _n == 1
            drop gvkey
            gen `c(obs_t)' link = _n
            local ntreated = _N
        }
        
        //    ASSIGN A LINKTO EACH GVKEY
        sort gvkey fyear
        gen `c(obs_t)' link = sum(gvkey != gvkey[_n-1])
        frlink m:1 link, frame(treatment_years)
        
        capture program drop one_rep
        program define one_rep, rclass
            tempvar pseudo_treatyear
            
            //    RANDOMIZE THE ORDER OF THE TREATMENT YEAR FRAME
            frame treatment_years {
                tempvar shuffle
                gen double `shuffle' = runiform()
                sort `shuffle'
                replace link = _n
            }
            
            //    LINK TO TREATMENT YEAR FRAME AND BRING IN PSEUDO-ASSIGNMENTS
            frlink rebuild treatment_years
            frget `pseudo_treatyear' = treatyear, from(treatment_years)
            tempvar pseudo_treat
            gen byte `pseudo_treat' = fyear >= `pseudo_treatyear'
            
            //    RUN THE REGRESSION AND RETURN RESULTS
            regress outcome i.`pseudo_treat' i.fyear, cluster(gvkey)
            return scalar rb0 = _b[1.`pseudo_treat']
            return scalar rse0 = _se[1.`pseudo_treat']
            set trace off
            exit
        end
        
        tempfile results
        simulate rb0 =r(rb0) rse0=r(rse0), reps(5000) seed(347544) saving(`results'): ///
            one_rep
        use `results', clear
        gen tstat = rb0/rse0
        histogram tstat // etc.
        Now, this code accords with my understanding of a placebo test. It does not match your explanation as I understand it. Specifically, as I understand a placebo test, it involves randomly reassigning treatment years (and corresponding treat status) to all of the gvkeys, not just the actually untreated ones. And that's what this code does.

        By the way, FWIW, I don't think your regression is correct either. You really need a two-way fixed effects model here. So I would, before calling -simulate-, -xtset gvkey- and I would change the regression command to -xtreg outcome i.`pseudo_treat' i.fyear, fe cluster(gvkey)-. But I'll leave that to you to decide.
        Last edited by Clyde Schechter; 26 Nov 2024, 09:50.

        Comment


        • #5
          You might try ritest.

          The trick here is that the number of pseudo-assignments is limited by the count of untreated states.

          Code:
          clear all
          set obs 30
          g id = _n
          g fe = rchi2(1)
          expand 15 
          bys id: g year = _n+1999
          g yfex = rchi2(1)
          egen yfe = mean(yfex), by(year)
          g treat = id==1
          g post = year>=2011
          g did = treat*post
          g y = fe + yfe + did*2 + rnormal()
          
          reghdfe y c.treat#c.post, absorb(id year) cluster(id)
          ritest treat _b[c.treat#c.post], cluster(id): reghdfe y c.treat#c.post, absorb(id year) cluster(id)
          ritest treat _b[c.treat#c.post]/_se[c.treat#c.post], cluster(id): reghdfe y c.treat#c.post, absorb(id year) cluster(id)

          Comment

          Working...
          X