Propensity Score Matching (PSM) + Difference-in-Difference (DID) regression with control variables

Sandra Bloem

Join Date: Jun 2020

Posts: 106
#1

Propensity Score Matching (PSM) + Difference-in-Difference (DID) regression with control variables

13 Jun 2021, 18:47

Hi there,

I have two-period balanced panel data (200 individuals in both periods), with which I have to estimate the effect of binary treatment "treated" on outcome variable y (I also have two continuous IVs: x1 and x2).

I need to implement PSM 3 nearest neighbor matching (I do this with -psmatch2-), and thereafter perform a DID regression with the conditioning variables used to estimate the propensity score included as control variables in this regression.

I see many people just using the weights constructed by -psmatch2- in the regression. However, this does not take into account which treatment id is matched with which 3 control id's.

My questions are:

1) Am I correct in thinking that just using the weights in the DID regression is not enough?

2) How do I solve this problem? As stated before, x1 and x2 need to be added as controls in the DID regression as well.

3) Does the method used to solve this change when kernel matching is used?

Example Dataset

Code:

/* install -xfill- by typing net from https://www.sealedenvelope.com/ and clicking on the name */ use https://www.stata-press.com/data/r17/parallelt, clear keep if inlist(t1,5,6) rename id1 id rename t1 t rename y1 y replace t = t-5 generate treated = treated1 if t==1 xfill treated, i(id) order id treated t y x1 x2 keep id-x2

Example of what I mean with just using weights in the regression

Code:

psmatch2 treated x1 x2 if t==0, n(3) caliper(0.1) common xfill _weight, i(id) regress y i.treated##i.t x1 x2 [aw=_weight], robust cluster(id)
Tags: None
CAO DUC SON

Join Date: Nov 2020

Posts: 13
#2

09 Dec 2021, 20:13

I have the same concern, wish someone could help!
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#3

09 Dec 2021, 20:36

Using the weights is enough for estimating average treatment effects (ATE), or ATE on the treated. Suppose subjects in set A are matched with those in set B: A1 is matched with B1, A2 is matched with B2, ... You may compute the treatment effect of each pair of subjects (Y_A1 - Y_B1, Y_A2 - Y_B2, ...), and then average them for ATE, or equivalently, compute average of Y_A and average of Y_B and make a difference -- the latter does not require the information of who is being matched with whom. In other words, the last three lines of code in #1 are appropriate.
2 likes
Comment
CAO DUC SON

Join Date: Nov 2020

Posts: 13
#4

13 Dec 2021, 07:24

Fei Wang , can I ask you a favor.
I have panel data. And I want to do a PSM then run a DID regression. But, I am confusing that:
- which year can I use for matching (pre or post-treatment)?
- After matching, can I drop the unmatched observations?
- And for the matching method, which test can I use for checking it?
thank you.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#5

13 Dec 2021, 07:35

Originally posted by CAO DUC SON View Post

Fei Wang , can I ask you a favor.
I have panel data. And I want to do a PSM then run a DID regression. But, I am confusing that:
- which year can I use for matching (pre or post-treatment)?
- After matching, can I drop the unmatched observations?
- And for the matching method, which test can I use for checking it?
thank you.

1. Only pre-treatment years can be used for matching.

2. You don't need to manually drop unmatched observations. If you match with -psmatch2- (from SSC), it automatically assigns zero weight to unmatched obs, and what you need to do is simply a DiD regression with weights.

3. You need to check if pre-treatment characteristics are sufficiently similar between treatment and control groups (balancing test).
1 like
Comment
CAO DUC SON

Join Date: Nov 2020

Posts: 13
#6

13 Dec 2021, 08:15

Thank you Fei wang, I am more clear now
Can I keep the panel data and run psmatch2... Or should I match in the pre-treatment data and then append it to post-treatment?
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#7

13 Dec 2021, 09:53

Originally posted by CAO DUC SON View Post

Thank you Fei wang, I am more clear now
Can I keep the panel data and run psmatch2... Or should I match in the pre-treatment data and then append it to post-treatment?

You may refer to #1, the last three lines of code. The OP matches treatment and control groups based on the first period of characteristics (also the pre-treatment period), and then pass the weights obtained from -psmatch2- onto other periods. Finally, run the DiD with weights. So yes, you may keep the panel data and run -psmatch2-, but only run it for the pre-treatment periods. Things may be more complicated if you have multiple pre-treatment periods, one way is to -psmatch2- the cross-sectional version of the panel data (after -reshape wide-). I show a simple example here. If you have a balanced panel data with 10 periods (t=1, 2, ..., 10). Variable d = 1 indicating treatment group and d = 0 for control group. Suppose treatment occurs at t = 5 (pre-treatment periods include t = 1, 2, 3, and 4). Then you reshape wide the panel data to a cross section, where x and z are time varying variables whose pre-treatment values are used for matching.

Code:

reshape wide x z, i(id) j(t)

Then run -psmatch2- on the cross-sectional data, where x1-x4 are the values of x at t = 1, ..., 4, and z1-z4 are the values of z at t = 1, .., 4. Variable w is a time-invariant variable used for matching.

Code:

psmatch2 d x1 x2 x3 x4 z1 z2 z3 z4 w, ties

After that you will obtain a variable _weight. Then reshape long the data back to panel.

Code:

reshape long x z, i(id) j(t)

You'll find each panel id has the same weights through all periods. Some have zero or missing weights and they are going to be dropped automatically in DiD regression. Finally, run the DiD as below (d is the treatment indicator and p is the pre-post indicator).

Code:

xtset id xtreg y c.d#c.p covariates i.t [aw=_weight], fe vce(cluster id)
1 like
Comment
CAO DUC SON

Join Date: Nov 2020

Posts: 13
#8

13 Dec 2021, 10:17

Dear Fei wang,
Thank you for your enthusiasm. I am clear now.
Comment
Doreen Nico

Join Date: May 2022

Posts: 1
#9

23 May 2022, 08:41

Fei Wang
Kindly assist,
I have panel data set t=2, i want to run PSM( Kernel and nearest neighbour) then DID( Binary outcome)
1. Do I match using the pre-treatment period? here is the command i use
psmatch2 Treatm age marstatus educ houssize child income2a if post==0,out(HWT_effctv) kernel kerneltype(normal) ate common
2. After matching to run DID I use logistic regression, as my outcome variable is binary.
logit HWT_effctv post##_treated[aw=_weight],noomitted vce(cluster Id) is this correct?
3. It seems analytical weight is not allowed? how to proceed forward?
Comment
Ishwor Adhikari

Join Date: Feb 2016

Posts: 22
#10

31 Jul 2022, 14:46

It is an interesting read. It seems simple when we have 2 time period. But what happens when there is three time periods in the sence when some county get treated only in third time periods (There are of course other counties who gets treated in 2 period). How to handle this?
Comment
Md Shoeb

Join Date: May 2022

Posts: 19
#11

29 Apr 2024, 11:47

Fei Wang Can you explain a bit more about the w variable in your code ?
Comment

Announcement

Propensity Score Matching (PSM) + Difference-in-Difference (DID) regression with control variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment