Synthetic DID in Stata as in Arkhangelsky-Athey-Hirshberg-Imbens-Wager (2021)

Luis Pecht

Join Date: May 2017
Posts: 130

#16

15 May 2022, 17:27

Thanks Jared Greathouse. Actually, my objective here is to replicate a sdid ( with no covariates) result in synth.

However based on both Help files:

Code:

 synth depvar predictorvars , trunit(#) trperiod(#) [ counit(numlist) xperiod(numlist) mspeperiod() resultsperiod() nested allopt unitnames(varname) figure keep(file) customV(numlist) optsettings ]

and

Code:

sdid depvar groupvar timevar treatment [if] [in], vce(vcetype) [covariates(varlist, [type]) ... ]

in synth'scase, predictorvars are not optional ( not in brackets).

In your example #15,

Code:

synth fert /// classic SCM- Justin Wiltshire's command
    fert(1960) ///
    fert(1955) ///
    fert(1965) ///
    fert(1955/1965), ///

Aren't this covariates(predictorvars) ?

thks

Comment

Damian Clarke

Join Date: Nov 2021

Posts: 11
#17

15 May 2022, 19:52

Hi Luis,

The idea behind the sdid command (and of course the theory from the Arkhangelsky et al., paper on which it is based), is to match exclusively on (full) pre-treatment trends. That is, in sdid, the equivalent of synth's predictorvars are just the dependent variable in each pre-treatment time period. For this reason, it is not necessary to explicitly request this information as an argument in sdid, we take this as given as the objective in generating the synthetic control. For reference, this is Arkhangelsky et al.'s Algorithm 1. In synth, one can match on arbitrary predictorvars, including covariates, or only particular pre-treatment periods. While sdid absolutely allows for the inclusion of covariates, these are not used in the generation of the synthetic control per se, but rather are concentrated out of the treated and control unit dependent variables prior to finding the optimal synthetic control.

Best wishes,
Damian
1 like
Comment
Damian Clarke

Join Date: Nov 2021

Posts: 11
#18

15 May 2022, 20:08

Originally posted by Muhammad Ibrahim Shah View Post

Hi Professor, Daniel PV, I am trying to practise the sdid in stata using the original data. But this gives me an error:

. sdid packspercapita state year treated, vce(placebo) seed(1213) graph g1_opt(xtitle("")) g2_opt(ylabel(0(50)150, axis(2)))
type mismatch: exp.exp: transmorphic found where struct expected
r(3000);

Hi Muhammad, just to let you know, this bug is now fixed. It turns out that there was an issue in sdid when using Mata in Stata <= 14.0. We have now corrected this issue, and all should work exactly as documented in the help file of the command. If you'd like to install the most recent version of the command with this bug fix incorporated, you can do so with:

Code:

net install sdid, from("https://raw.githubusercontent.com/daniel-pailanir/sdid/master") replace

This will be updated on the SSC in the near future.
Best wishes,
Damian
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2113
#19

15 May 2022, 22:56

Luis Pecht In my mind (and I suspect others may disagree with me), lags of the outcome aren't ""really"" covariates. They're outcomes we're interested in, sure, but they're not other variables outside of the outcome. I suppose some of it does boil down to semantics.

SDID, as I understand it, doesn't just generate unit weights as normal SCM does, it also weights the donor pool by time, too. So, if you use SDID, it isn't necessary that the trends "match" directly on each other, so long as they're parallel. So, if you COULD make synth replicate the results of SDID by re-engineering it, then my next advice to you would be to write a paper on it and send it to Stata Journal.

Speaking of which, it may make sense at some point for me to write a review paper on the SCM capabilities in Stata. SCM is developing so fast that having a go to paper which surveys and discusses this method in one place would be nice.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2113

#20

18 May 2022, 10:06

I have a bit of question on the graphical side. Consider the following code please Damian Clarke Daniel PV

Code:

u "https://github.com/scunning1975/mixtape/blob/master/smoking.dta?raw=true", clear

cls

qui g treated = cond(state==3 & year >=1988,1,0)


sdid ci st ye tr, vce(placebo) graph

Here we estimate the causal effect of Prop 99 using Cunningham's data.

One graph we get is the treated vs control averages, which under the hood produces the following dataset

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year double __00004EControl float(__00004ETreated lambda)
1970 141.99410930275917   123         0
1971 145.15848191082478   121         0
1972  149.7487874776125 123.5         0
1973  149.1462676078081 124.4         0
1974  150.3044495433569 126.7         0
1975 150.56657886505127 127.1         0
1976  154.6930844038725   128         0
1977 152.55365639925003 126.4         0
1978 150.81208044290543 126.1         0
1979  146.9357661753893 121.9         0
1980  145.4907839745283 120.2         0
1981 144.74189530313015 118.6         0
1982 141.83772917091846 115.4         0
1983  137.1082665771246 110.8         0
1984  129.2928065508604 104.8         0
1985 127.52268949151039 102.8         0
1986 124.41831301152706  99.7 .58608836
1987 123.21651093661785  97.5  .4139117
1988 117.52167113125324  90.1         .
1989 113.48608428239822  82.4         .
1990 108.28887414932251  77.8         .
1991 103.45071151852608  68.7         .
1992 101.98116411268711  67.5         .
1993 101.88585385680199  63.4         .
1994 100.52123700082302  58.6         .
1995 101.26544179022312  56.4         .
1996  99.80574382841587  54.5         .
1997 100.54352866113186  53.8         .
1998 100.98628042638302  52.3         .
1999  99.10571823269129  47.2         .
2000  92.17977494001389  41.6         .
end

The estimated ATT is -15.38, but when I do

Code:

g diff = __00004ETreated - __00004EControl

mean diff if year > 1988

we get -41.608. Now I'm no econometrician or statistician, but -41.608 is pretty far from -15.38.

My question here, is how do I generate the predicted counterfactual, such that the difference between the counterfactual and the treated unit is the exact same causal effect generated by the SDID?

Perhaps reshaping or multiplying by the lambda variable are involved? I want to compare the counterfactual by SDID to my estimator and normal SCM, so I wanted to show in a line graph the counterfactual generated by SDID.

Comment

Daniel PV

Join Date: Jun 2019
Posts: 32

#21

18 May 2022, 11:25

Hi Jared Greathouse , first: when we calculate the tau (-15.38560) in the sdid command, we follow a matrix operation:

Code:

u "https://github.com/scunning1975/mixtape/blob/master/smoking.dta?raw=true", clear
qui g treated = cond(state==3 & year >=1988,1,0)
sdid ci st ye tr, vce(placebo) graph
keep state year cigsale    
reshape wide cigsale, i(state) j(year)
replace state=40 if state==3   //put treated unit at the end of matrix
sort state
mkmat cigsale*, matrix(Y)      //outcome matrix
matrix O=e(omega)
matrix O=O[1..38,1]            //weight omega
matrix L=e(lambda)
matrix L=L[1..18,1]            //weight lambda
matlist (-O', J(1,1,1/1))*Y*(-L',J(1,13,1/13))' //1 treatment unit and 13 post period

Second: to calculate tau using treatment and control data, you need to add a few things:

Code:

clear
input float year double Control float(Treated lambda)
1970 141.99410930275917   123         0
1971 145.15848191082478   121         0
1972  149.7487874776125 123.5         0
1973  149.1462676078081 124.4         0
1974  150.3044495433569 126.7         0
1975 150.56657886505127 127.1         0
1976  154.6930844038725   128         0
1977 152.55365639925003 126.4         0
1978 150.81208044290543 126.1         0
1979  146.9357661753893 121.9         0
1980  145.4907839745283 120.2         0
1981 144.74189530313015 118.6         0
1982 141.83772917091846 115.4         0
1983  137.1082665771246 110.8         0
1984  129.2928065508604 104.8         0
1985 127.52268949151039 102.8         0
1986 124.41831301152706  99.7 .58608836
1987 123.21651093661785  97.5  .4139117
1988 117.52167113125324  90.1         .
1989 113.48608428239822  82.4         .
1990 108.28887414932251  77.8         .
1991 103.45071151852608  68.7         .
1992 101.98116411268711  67.5         .
1993 101.88585385680199  63.4         .
1994 100.52123700082302  58.6         .
1995 101.26544179022312  56.4         .
1996  99.80574382841587  54.5         .
1997 100.54352866113186  53.8         .
1998 100.98628042638302  52.3         .
1999  99.10571823269129  47.2         .
2000  92.17977494001389  41.6         .
end

replace lambda=-lambda
replace lambda=1/13 if lambda==.
gen d=Treated-Control
gen dw=d*lambda
egen tau=sum(dw)

basically add a constant weight for the period after.
Let me know if this clears up your doubts!

Last edited by Daniel PV; 18 May 2022, 11:27.

Comment

Joe Zonda

Join Date: Jan 2021

Posts: 64
#22

30 Jul 2022, 05:05

Prof. Daniel PV or indeed anyone on this forum, would you kindly help with how to plot the treatment effect graph from sdid. I have come across a paper using SDID recently published in EER and they present their graph in an appealing way (see figure attached). How can I get such a graph? Please help
.
1 like
Comment

Daniel PV

Join Date: Jun 2019
Posts: 32

#23

31 Jul 2022, 19:47

Hi Joe Zonda, I think I have an answer to that. You must execute the sdid command, as many times as post treatment times. For example, for the Proposition 99 case, you would run sdid 12 times (1989 to 2000), in each case using all control times (1970 to 1988) and only the t-post time. I recommend you use a loop.

Code:

webuse set www.damianclarke.net/stata/
webuse prop99_example.dta, clear
matrix tau_prop99=J(12,3,.) //create an matrix to hold the results

local j=1
forval t=1989(1)2000 {
    sdid packspercapita state year treated if year<=1988 | year==`t', vce(placebo) seed(1213) reps(100)
    
    *save tau and lower and upper bound
    local tau=e(tau)[1,1]
    local se=e(se)
    local lci=`tau'+invnormal(0.025)*`se'
    local uci=`tau'+invnormal(0.975)*`se'
    matrix tau_prop99[`j',1]=`tau'
    matrix tau_prop99[`j',2]=`lci'
    matrix tau_prop99[`j',3]=`uci'
    local ++j
}

matlist tau_prop99
matrix coln tau_prop99=tau lower upper
clear
svmat tau_prop99, n(col)
gen year=_n+1988 //define the time variable for the graph

#delimit ;
tw line tau year, || rcap lower upper year || scatter tau year, mc(black) ||
   , yline(0,lc(balck%50) lp(dash)) legend(off) ytitle("Treatment effect by year") 
     xtitle("") xlabel(1989(1)2000) ylabel(-50(10)20) scheme(gg_tableau);
graph export prop99.eps, replace;
#delimit cr

the result is something like this

Click image for larger version

Name: prop99.png
Views: 1
Size: 36.2 KB
ID: 1675944

Attached Files

prop99.eps (22.4 KB, 1 view)

Comment

Joe Zonda

Join Date: Jan 2021

Posts: 64
#24

31 Jul 2022, 23:13

Daniel PV thank you so much. This is amazing. No way I could figure this one out on my own. Thank you a million times.
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 590
#25

01 Aug 2022, 02:38

Hi all,

Quick question concerning the graph that the community-contributed command sdid produces, namely the one with the name of all donor pool units with a bubble representing their weight.

Is there any way to increase the horizontal spacing between the names of the units (and the ensuing bars and bubbles on the actual graph) in the code, and the font size of the names of these units? The reason I ask is that it is possible to have a large number of donor pool units, making the graph illegible.

Sure, after the estimation, the font size can be modified, however not very flexibly and only in categories (e.g. tiny, minuscule, etc.).
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2113
#26

01 Aug 2022, 08:49

You'll likely need to get into the guts of the ado code in order to do this. Which to be fair isn't that hard in this case if you know how to program ado code, but in my experience at least it isn't readily attainable, but I could be quite wrong since I didn't write this.
Comment
Daniel PV

Join Date: Jun 2019

Posts: 32
#27

01 Aug 2022, 09:36

Hi Maxence Morlet , my recommendation on bubble charts is to only do so if you have manageable control units to plot. You can change the font size using xlabel(,labsize()) in g1_opt(). About the x-axis spacing, I'm not really familiar with that issue, but my impression is that Stata does it automatically, so the only way I'd try is using xscale to modify. By the way, if you want to create your own graph, you can do it using omega and lambda matrix.

Code:

webuse set www.damianclarke.net/stata/ webuse prop99_example.dta, clear #delimit ; sdid packspercapita state year treated, vce(placebo) reps(5) seed(1213) graph g1_opt(xtitle("") xscale(range(-12 50)) xlabel(,labsize(10pt))) g2_opt(ylabel(0(50)150)) graph_export(sdid_, .eps); #delimit cr
2 likes
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 590
#28

02 Aug 2022, 01:45

Thanks Daniel! By the way I really appreciate the very useful command.
1 like
Comment
Hideto Koizumi

Join Date: Dec 2021

Posts: 7
#29

16 Jan 2023, 02:50

Thank you very much Daniel for making this package available!

I have an issue on mat size while using sdid. My understanding is that we can circumvent the maximum mat size (11000 in Stata MP) using mata. It appears to me that everything is done in mata in sdid.ado, so I am puzzled why I get mat size too small error. Do you have any idea on which portion of sdid.ado possibly causes this issue?

I would very much appreciate your response.
Kindest,
Hideto
Comment
Daniel PV

Join Date: Jun 2019

Posts: 32
#30

24 Jan 2023, 06:18

Hi Hideto Koizumi , sorry I got here so late. Is the problem due to the estimation or the graphic option? I think maybe it's a problem with some matrix in the graphics section because, as you said, the sdid command and all the estimation is under the 'mata' code but there are a couple of stata's own matrices in part of the graphical output. By the way, I'm assuming you're using the latest version of sdid.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment