Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Staggered DID with no never takers

    This is not exactly a Stata question but rather an econometric question on a Stata implementation. Can we use staggered DID in a database that features no never takers (that is, no individuals that remain untreated all along the time observation window)? I have always thought the answer to be yes but since I have run into problems in a real world implementation, I decided to try a simple numerical simulation in Stata and I am somewhat confused by the results. Below is code for replication but let me briefly state what I've done for clarity:
    • I simulate a database with I=100 individuals and T=10 periods, no missing data, all individuals are treated exactly once in a randomly chosen period
    • I try to estimate the effect of the treatment on a (purely random) outcome
    • First I use static TWFE (I am following the terminology in Callaway and Sant'anna 2021, this means regressing the outcome on time effects, individual effects and a treatment dummy), no problem here
    • When I try dynamic TWFE (that is, I regress the outcome on time effects, individual effects, treatment lags and treatment leads, leaving the first lag out for identification), Stata has to drop one time effect due to collinearity. This problem persists after changing the command used (whether reg or reghdfe) or the way variables are defined or ordered in the command, one regressor must always be dropped, implying that collinearity is a feature of the data
    • If I fabricate just one never taker, the problem goes away and dynamic TWFE works just fine
    Am I doing something wrong? If not, does this imply that dynamic TWFE does not work in the absence of never takers? Is there a lesson to be learned here or is it just a case of a method that is not prepared to deal with any kind of data?

    Code:
    ***Database generation
    
    clear all
    set seed 260586
    set obs 100
    gen i=_n
    expand 10
    bys i: gen t=_n
    
    gen r=round(uniform()*8) // random treatment assignment
    bys i: gen rr=r if _n==1
    replace rr=5 if rr==0
    bys i: egen t_treat=mean(rr)
    drop r rr
    gen treat=t==t_treat
    
    gen treat_dif=t-t_treat // lags and leads
    forvalues i=1/9 {
        local j=10-`i'
        gen treat_lag`i'=treat_dif==-`i'
        gen treat_lead`i'=treat_dif==`i'
    }
    
    gen y=rnormal(0,1) // random outcome
    drop treat_lag8 treat_lag9 // no cases with these many lags
    
    ***Estimation
    
    xtset i t
    xtreg y treat i.t, fe // traditional (static) TWFE
    xtreg y treat_lag7 treat_lag6 treat_lag5 treat_lag4 treat_lag3 treat_lag2 treat treat_lead* i.t, fe // dynamic TWFE
    
    ***Now let's throw in one never taker and try again
    
    unab vars: treat_lag1-treat_lead9
    foreach var in `vars' {
        replace `var'=0 in 1/10
    }
    
    xtreg y treat_lag7 treat_lag6 treat_lag5 treat_lag4 treat_lag3 treat_lag2 treat treat_lead* i.t, fe // dynamic TWFE

  • #2
    The dynamic method is not suitable in this data because the lag and lead effects cannot all be identified along with a full set of time indicators when there are no never takers to "complete" the scenario by having all of the treat_* and treat variables equal to zero. Because if you know all of the values of treat, treat_lag*, and treat_lead* except one of them, you are guaranteed to know the one holdout as well: it is 0 if you already encountered a 1 and 1 if not. If there were a never-taker in the data set, then you could not make that prediction with certainty. In effect, the set of variables consisting of treat, treat_lag*, and treat_lead* is a complete set of indicator ("dummy") variables for the variable treat_dif in your setting. What you are doing is just a more elaborate version of the "dummy variable trap." One of those indicators has to go, or you need a never-taker to break the colinearity.
    Last edited by Clyde Schechter; 02 Jul 2024, 18:58.

    Comment


    • #3
      I discuss this in my 2021 working paper on TWFE. I also discuss this in my nonlinear DiD paper in the Econometrics Journal. As Clyde points out, mechanically one of the dummies has to be dropped. You should probably drop the indicator for the final treated cohort as that forces all comparisons to made with that as the comparison in the final period. Initially, think of the estimates on the other lags as as being the effect of treated earlier as opposed to the last period. An effect cannot be estimated for the final treated cohort because there are no remaining control units.

      Because a no anticipation assumption is imposed in estimation, the coefficients up until the final period can be interpreted as before: the estimates are compared with the never treated state.

      The Stata do files for generated data in my shared Dropbox might help: Stata do files.

      BTW, with a staggered intervention, it's now fairly common to estimate different effects by cohort/calendar time and then weight them to get an event study plot.


      Comment


      • #4
        Thank you very much Clyde and Jeff, your answers are super helpful. I will make sure to read your work and try that suggestion, Jeff.

        Comment

        Working...
        X