Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Synthetic difference-in-differences event study style plots with permutation-based confidence intervals

    Hi,
    Section 4.4 of Clarke et al (2023) shows how to generate event study style plots with the synthetic difference-in-differences procedure developed by Arkhangelsky et al (2021). Specifically, Clarke et al provide the Stata code for generating this type of plot with confidence intervals generated following a block bootstrapping procedure. However, they note that this type of plot could also be generated following a placebo permutation procedure. I would like to know how to generate such a plot.

    I've tried to write my own code to do this based on the author's existing code for the event study plot with bootstraping + the .ado code for generating placebo/permutation-based standard errors. Could someone review my code and check that it is properly generating placebo-based confidence intervals?

    Code:
    /*=======================================================================
    Prepare data
    ========================================================================*/
     
    webuse set www.damianclarke.net/stata/
    webuse quota_example.dta, clear
    egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
    egen mm=mean(m), by(country)
    keep if mm==2002 | mm==. //keep only one time of adoption
    drop if lngdp==. //keep if covariates observed
    
    
    
    /*=======================================================================
    Create vector of values from Equation (8)
    ========================================================================*/
    
    qui sdid womparl country year quota, vce(noinference) graph g2_opt(ylab(-5(5)20) ytitle("Women in Parliament") scheme(sj)) graph_export(groups, .pdf) covariates(lngdp, projected) 
    
    matrix lambda = e(lambda)[1..12,1] //save lambda weight
    matrix yco = e(series)[1..12,2] //control baseline
    matrix ytr = e(series)[1..12,3] //treated baseline
    matrix aux = lambda'*(ytr - yco) //calculate the pre-treatment mean
    scalar meanpre_o = aux[1,1]
    matrix difference = e(difference)[1..26,1..2] // Store Ytr-Yco
    svmat difference
    ren (difference1 difference2) (time d)
    replace d = d - meanpre_o // Calculate vector in (8)
    
    
    /*=======================================================================
    Do permutation procedure to generate confidence intervals
    ========================================================================*/
    
    * Generate locals related to treatment status
    tempvar treated ty tyear n
    qui gen `ty' = year if quota == 1 
    qui bys country: egen `treated' = max(quota)
    qui by  country: egen `tyear'   = min(`ty')
        
    * Generate local for number of control units
    local co = 106
        
    * Set replication counter and number of reps    
    local b = 1
    local B = 100
    
    * Loop over all permutation replications
    while `b'<=`B' {
        
        * Preserve data
        preserve
        
        * Drop treated units
        qui drop if `tyear' !=.
        
        * Generate numeric order by random uniform variable
        tempvar r rand id
        sort year country
        qui gen `r' = runiform() in 1/`co'
        bysort country: egen `rand' = sum(`r')
        egen `id' = group(`rand')
    
        * Assign placebo treatment 
        local i=1
            forvalues y=1/`co' {
                qui replace `tyear' = tryears[`i',1] if `id' == `y'
                
                local ++i
            }
         qui replace quota = 1 if year >= `tyear'
        
        
        * Run SDiD
        qui: sdid womparl country year quota, vce(noinference) graph
            
        * Collect bootstrapped versions of the quantities you stored above to calculate the second term in Equation 8
        matrix lambda_b = e(lambda)[1..12,1]
        matrix yco_b = e(series)[1..12,2]
        matrix ytr_b = e(series)[1..12,3]
        matrix aux_b = lambda_b'*(ytr_b - yco_b)
        matrix meanpre_b = J(26,1,aux_b[1,1])
            
        * Collect this replication's Equation 8 values (and store in a matrix)
        matrix d`b'=e(difference)[1..26,2] - meanpre_b
            
        * Update loop counter
        local ++b
        
        * Restore dataset
        restore
    }
        
    
    *** Calculate standard deviation of each estimate from (8) based on the bootstrap resamples
        * Keep original estimate of (8)
        preserve
        keep time d
        keep if time!=.
        
        * Import vector of resampled estimates
        forval b=1/`B' {
            svmat d`b'
        }
        
        * Calculate standard deviation of estimates across each time period
        egen rsd = rowsd(d11 - d`B'1)      
    
        
    * Generate confidence intervals
    gen LCI = d + invnormal(0.025)*rsd 
    gen UCI = d + invnormal(0.975)*rsd 
    
    
    /*=======================================================================
    Generate plot
    ========================================================================*/
    
    tw rarea UCI LCI time, color(gray%40) || scatter d time, color(blue) m(d) xtitle("") ytitle("Women in Parliament") xlab(1990(1)2015, angle(45)) legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2)) xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash)) scheme(sj)
    
    restore

  • #2
    I've made some edits to the code to try to get closer to the procedure described in Clarke et al (2023), but I'm still not sure if I'm doing this correctly:

    Code:
    /*=======================================================================
    Store Equation (8) data series
    ========================================================================*/
    
    * Prepare data
    webuse set www.damianclarke.net/stata/
    webuse quota_example.dta, clear
    egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
    egen mm=mean(m), by(country)
    keep if mm==2002 | mm==. //keep only one time of adoption
    drop if lngdp==. //keep if covariates observed
    
    * Run SDiD
    sdid womparl country year quota, vce(noinference) graph
    
    * Calculate second term in Equation 8
    matrix lambda = e(lambda)[1..12,1] // time weights
    matrix yco = e(series)[1..12,2] // pre-treatment values for treated unit
    matrix ytr = e(series)[1..12,3] // average pre-treatment values for control units
    matrix aux = lambda'*(ytr - yco) // calculate the pre-treatment baseline difference
    global meanpre_o = aux[1,1]        // store this value
    di $meanpre_o
    
    * Store pre-made output on first term of Equation 8
    matrix difference = e(difference)[1..26,1..2]  // store Ytr - Yco
    svmat difference
    keep difference1 difference2
    ren (difference1 difference2) (year trt_ctrl_diff)
    
    * Calculate the vector in equation 8
    gen eqn8 = trt_ctrl_diff - meanpre_o   
    drop trt_ctrl_diff
    drop if year == .    
    
    * Save 
    save "eqn8_values", replace
    
    
    
    
    /*=======================================================================
    Do permutation procedure to generate confidence intervals
    ========================================================================*/
    
    * Prepare data
    webuse set www.damianclarke.net/stata/
    webuse quota_example.dta, clear
    egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
    egen mm=mean(m), by(country)
    keep if mm==2002 | mm==. //keep only one time of adoption
    drop if lngdp==. //keep if covariates observed
    
    
    * Drop treated units
    drop if country == "Djibouti" | country == "Morocco"
        
    * Reset state codes so they're all in order
    encode country, gen(country_code)
        
    
    * Set permutation counter and number of reps    
    local p = 1
    local P = 100
    
    
    * Loop over all permutation replications
    while `p'<=`P' {
        
        * Randomly pick placebo state
        local placebo_unit = runiformint(1, 106)
    
        * Assign placebo treatments 
        drop quota
        gen quota = (year >= 2002 & country_code == `placebo_unit')
        
        * Run SDiD
        qui: sdid womparl country year quota, vce(noinference) graph
            
        * Collect the quantities you stored above to calculate the second term in Equation 8
         matrix lambda_p = e(lambda)[1..12,1]
         matrix yco_p = e(series)[1..12,2]
         matrix ytr_p = e(series)[1..12,3]
         matrix aux_p = lambda_p'*(ytr_p - yco_p)
         matrix meanpre_p = J(26,1,aux_p[1,1])
            
        * Collect Equation 8 values (and store in a matrix)
         matrix d`p'=e(difference)[1..26,2] - meanpre_p
            
        * Update loop counter
            local ++p
        
    
    }
        
    
    
        
    * Save all placebo estimates of Equation 8 values
    sort country year
    forval p=1/`P' {
        svmat d`p'
    }
        
    * Calculate standard deviation of estimates across each time period
    egen rsd = rowsd(d11 - d`P'1)      
    keep year rsd
    drop if rsd == .
    
    * Merge in true Equation 8 values
    merge 1:1 year using "eqn8_values", nogen
        
    * Generate confidence intervals
    gen LCI = eqn8 + invnormal(0.025)*rsd 
    gen UCI = eqn8 + invnormal(0.975)*rsd 
    
    
    /*=======================================================================
    Generate plot
    ========================================================================*/
    
    
    *generate plot
    tw rarea UCI LCI year, color(gray%40) || scatter eqn8 year, color(blue) m(d) xtitle("") ytitle("") legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2)) xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash)) scheme(sj)

    Comment


    • #3
      Hi Noah,
      This is a nice idea, and great to see you've applied the code to permutation inference! I think there are just two small things that I would change. I've made some edits to the code below and think that this should do what you're after. The changes are (a) in your code towards the top, meanpre_o is defined as a global, but it needs to be a scalar as it is used below in the code as a scalar (or the code below should be adapted to refer to the global), and (b) when you do the permutations, we want to permute the same treatment structure, which in this case is to have two treated units. In your code you've set up permutations assuming a single treated unit, so below I've changed this for the two countries. Please see below for the version with these edits.
      Code:
      /*=======================================================================
         Store Equation (8) data series
      ========================================================================*/
      
      * Prepare data
      webuse set www.damianclarke.net/stata/
      webuse quota_example.dta, clear
      egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
      egen mm=mean(m), by(country)
      keep if mm==2002 | mm==. //keep only one time of adoption
      drop if lngdp==. //keep if covariates observed
      
      * Run SDiD
      sdid womparl country year quota, vce(noinference) graph
      
      * Calculate second term in Equation 8
      matrix lambda = e(lambda)[1..12,1] // time weights
      matrix yco = e(series)[1..12,2] // pre-treatment values for treated unit
      matrix ytr = e(series)[1..12,3] // average pre-treatment values for control units
      matrix aux = lambda'*(ytr - yco) // calculate the pre-treatment baseline difference
      scalar meanpre_o = aux[1,1]        // store this value
      di meanpre_o
      
      * Store pre-made output on first term of Equation 8
      matrix difference = e(difference)[1..26,1..2]  // store Ytr - Yco
      svmat difference
      keep difference1 difference2
      ren (difference1 difference2) (year trt_ctrl_diff)
      
      * Calculate the vector in equation 8
      gen eqn8 = trt_ctrl_diff - meanpre_o
      drop trt_ctrl_diff
      drop if year == .
      
      * Save
      save "eqn8_values", replace
      
      
      
      
      /*=======================================================================
          Do permutation procedure to generate confidence intervals
      ========================================================================*/
      
      * Prepare data
      webuse set www.damianclarke.net/stata/
      webuse quota_example.dta, clear
      egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
      egen mm=mean(m), by(country)
      keep if mm==2002 | mm==. //keep only one time of adoption
      drop if lngdp==. //keep if covariates observed
      
      
      * Drop treated units
      drop if country == "Djibouti" | country == "Morocco"
      
      * Reset state codes so they're all in order
      encode country, gen(country_code)
      
      
      * Set permutation counter and number of reps
      local p = 1
      local P = 100
      
      
      * Loop over all permutation replications
      while `p'<=`P' {
          
          drop quota
          // Need to choose 2 placebo states.  Let's just take max and min of rand number
          // This could probably be done in less lines, but the below should work fine
          gen c = rnormal()
          bys country: egen chooser = min(c)
          qui sum chooser
          gen treated = chooser==r(min) | chooser==r(max)
          
          gen quota = (year >= 2002 & treated==1)
      
          * Run SDiD
          qui: sdid womparl country year quota, vce(noinference) graph
      
          * Collect the quantities you stored above to calculate the second term in Equation 8
          matrix lambda_p = e(lambda)[1..12,1]
          matrix yco_p = e(series)[1..12,2]
          matrix ytr_p = e(series)[1..12,3]
          matrix aux_p = lambda_p'*(ytr_p - yco_p)
          matrix meanpre_p = J(26,1,aux_p[1,1])
      
          * Collect Equation 8 values (and store in a matrix)
          matrix d`p'=e(difference)[1..26,2] - meanpre_p
      
          * Update loop counter
          local ++p
          drop c chooser treated
      }
      
      * Save all placebo estimates of Equation 8 values
      sort country year
      forval p=1/`P' {
          svmat d`p'
      }
      
      * Calculate standard deviation of estimates across each time period
      egen rsd = rowsd(d11 - d`P'1)
      keep year rsd
      drop if rsd == .
      
      * Merge in true Equation 8 values
      merge 1:1 year using "eqn8_values", nogen
      
      * Generate confidence intervals
      gen LCI = eqn8 + invnormal(0.025)*rsd
      gen UCI = eqn8 + invnormal(0.975)*rsd
      
      
      /*=======================================================================
          Generate plot
      ========================================================================*/
      
      
      *generate plot
      tw rarea UCI LCI year, color(gray%40) || scatter eqn8 year, color(blue) m(d) xtitle("") ytitle("") legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2)) xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash)) scheme(sj)
      graph export sdidplacebo.pdf, replace

      Output is as follows:
      Click image for larger version

Name:	Screenshot from 2023-07-20 17-22-55.png
Views:	1
Size:	67.8 KB
ID:	1721276


      Best wishes,
      Damian

      Comment


      • #4
        Thank you very much, Damian!

        Comment


        • #5
          Hi,
          I tried to reproduce both this event study (placebo) as well as the one in the session 4.4 in Clark et al., 2023. However I am encountering some issues in looping over all permutation replications. The error I get is: "<= invalid name". I can't understand how to solve it. Can someone help me please?

          Comment


          • #6
            Pierangela Peruzzo, are you copying-and-pasting Damian's code exactly? It still works for me.

            If you are copy-and-pasting exactly, can you go line-by-line to see where the error message is coming up?

            Comment

            Working...
            X