Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • csdid to replicate Callaway and Sant'Anna (2021)

    Dear all
    I am trying to replicate the well-known paper by Callaway and Sant'Anna (2021) Difference-in-Differences with multiple time periods (https://doi.org/10.1016/j.jeconom.2020.12.001). Rather surprisingly, I haven't been able to find any do-file that allows for this in Stata, not even in the authors' personal pages. I had a go at it using the very useful Fernando Ríos-Avila's materials, specifically Playing with Stata (friosavila.github.io). The code that attempts to replicate Table 3 in the paper (arguably the main table) is copied below. Note that the original data can be found at
    https://github.com/pedrohcgs/CS_RR
    where it is stored as an "rds" file. I am attaching the converted CSV version of such a file (dropping some variables to make it uploadable in the Statalist forum), which is, in turn, used in the code below.

    Code:
    import delimited "min_wage_CS_reduced.csv", clear case(lower)
    
    /*
    treat is treatment qualifier: 1 if treat at any point, 0 o/w
    countyreal is a decode of county_name in the original data
    */
    rename firsttreat first_treat
    gen post_treatm      =inlist(year, 2004, 2005, 2006, 2007)
    gen w                =post_treatm*treat
    egen region_year=group(region year)
    
    sort countyreal year
    xtset countyreal year, yearly
    
    *Table 3
    //Panel A
    ///Row 1: TWFE
    xtreg lemp w i.region_year, fe vce(cluster countyreal)
    
    preserve
    csdid lemp, ivar(countyreal) time(year) gvar(first_treat)  ///
    agg(event) saverif(results_unconditional) replace
    estat pretrend
    
    use results_unconditional, clear
    ///Row 2
    csdid_stats simple
    
    ///Row 3: Group-specific effects
    csdid_stats group
    
    ///Row 4: Event Study
    csdid_stats event
    
    ///Row 5: Calendar time effects
    csdid_stats calendar
    
    ///Row 6: Event study e=0 e=1 w/ Balanced groups
    *?
    restore
    
    //Panel B
    ///Row 1: TWFE
    local controls i.region c.white c.hs c.pov c.pop##c.pop c.medinc##c.medinc
    xtreg lemp w i.region_year (`controls')##i.year, fe vce(cluster countyreal)
    
    preserve
    csdid lemp i.region white hs pov c.pop##c.pop c.medinc##c.medinc, ivar(countyreal) time(year)  gvar(first_treat) method(drimp) ///
    agg(event) saverif(results_conditional) replace 
    estat pretrend
    
    use results_conditional, clear
    ///Row 2
    csdid_stats simple
    
    ///Row 3: Group-specific effects
    csdid_stats group
    
    ///Row 4: Event Study
    csdid_stats event
    
    ///Row 5: Calendar time effects
    csdid_stats calendar
    
    ///Row 6: Event study e=0 e=1 w/ Balanced groups
    *?
    restore
    Panel A in Table 3 is mostly replicated: csdid without any controls allows me to replicate rows 2, 3, 4, and 5 in the paper, where the TEs are aggregated in different ways. Fine. But still, 2 questions remain
    i) Where does the coefficient in row 1 in the paper, TWFE, come from? The paper says, "... we first estimate the coefficient on a post-treatment dummy variable in a model with unit fixed effects and region-year fixed effects...". The command above (under "Row 1") results in 0.0177 but the one in the paper is −0.037. Any idea what is the correct specification?
    ii) Does csdid allow us to obtain the last row (Row 6: Event study w/ Balanced groups) automatically? Of course, this can be done manually, but I am wondering if this has been automatized

    Panel B is somewhat replicated: rows 2 and 3 are, but the rest are not. Of course, this boils down to the model that I have interpreted from the paper, using variables from Table 2. Importantly, the paper says "... We use the doubly robust estimation procedure discussed above. [...] For each generalized propensity score, we estimate a logit model that includes each county characteristic along with quadratic terms for population and median income. For the outcome regressions, we use the same specification for the covariates".
    i) My understanding is that, typically, doubly robust methods allow to specify separately an outcome model and a treatment model (see e.g. teffects aipw). But csdid does not allow such decoupling: the model is the same for both. This, in turn, does not allow following what is declared in the original paper, where 2 different models are defined. Why this decoupling is not allowed in this case? Is this what is driving the divergent results? I checked drdid, and it does not allow such decoupling either. Hence, how can the specification implicitly declared in the paper be achieved?
    ii) What is the specification to obtain row 1 TWFE in this case with controls? I get 0.0165 but the paper reports −0.008

    Any insight into this will be greatly appreciated, and hopefully, it will also help those who are trying to replicate the paper!

    Many thanks in advance
    JM


    I am using Stata 17.0

    ps: if the attachment does not work, you can open R and run this bit of code after you download the data in https://github.com/pedrohcgs/CS_RR

    ls()
    rm(list = ls())
    getwd()
    setwd('PERSONALFOLDER')
    min_wage <- readRDS('min_wage_CS.rds')
    write.csv(as.matrix(min_wage),file="min_wage_CS.cs v")

    the file uploaded here, min_wage_CS_reduced, drops unnecessary variables from the original dataset
    Attached Files
    Last edited by JM del Pozo; 07 Jul 2024, 17:26.

  • #2
    Hi there
    Just one point to clarify, when CS refer to TWFE, it assumes you will run a model like this
    reghdfe y d, abs(i yr)
    CSDID is only meant to estimate the method proposed by Callaway and Sant'Anna
    F

    Comment

    Working...
    X