Regression after propensity score matching

Mia Pham

Join Date: Mar 2015

Posts: 44
#1

Regression after propensity score matching

06 Oct 2016, 21:10

Hi everyone. I would need your help with the following
I'm running a dif-in-dif analysis and the first stage is to match each observation in the treated group with one ob in the control group by nearest neighbour propensity score.
For simplicity, I use the following sample data

use http://ssc.wisc.edu/sscc/pubs/files/psm,replace

( a treatment indicator t, covariates x1 and x2, and an outcome y)

Then, I use psmatch2 for propensity score match:

psmatch2 t x1 x2, out(y) logit

Now I have new id (generated by stata as _id) of treated observations and id of the matched control observations for each pair. After dropping obs in the control group that are not matched with any obs in the treated group, I now have a new sample

Next, I want to run a regression to test the effect of the treatment and I want the variable t (1 for treated and 0 for control) to capture the difference between treated and control for each pair (that was matched before in the propensity score match). I got confused at this stage because if I simply run:

reg y t x1 x2

then what t captures is the average difference between the whole treated group and the whole control group, instead of the difference for each pair.

Can you please suggest how I can solve this.

Thank you so much
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

06 Oct 2016, 22:58

So, you need the variable which identifies each pair. I haven't used -psmatch2- in a long time and I don't remember how you get that variable. BUt, let's assume you have it and it's called pairid. You also presumably have a subject_id for each person (or firm, or whatever they are). Then you have to account for the pairing as follows:

Code:

mixed y t x1 x2 || pair_id: || subject_id

BUT, there is another problem. You said you want to do a difference in differences analysis. That regression equation doesn't do that. So you also need another variable that indicates pre- and post- onset of treatment status, call it pre_post. Then what you want is:

Code:

mixed y i.t##i.pre_post x1 x2 || pair_id: || subject_id:

The DID estimator of the treatment effect will be the coefficient of 1.t#.pre_post. The best way to understand the results, though, is to look at predicted outcomes in each group both before and after the onset of treatment. -margins t#pre_post- will give you that.
Comment
Mia Pham

Join Date: Mar 2015

Posts: 44
#3

07 Oct 2016, 00:29

Thank you so much Clyde
I forgot to include the variable for pre and post treatment. Sorry about that.
I just want to clarify that I do not have the pair id
-psmatch2- provide _id, which is the id number for all observations (bot treated and control), and next to the column _id is the column _n1, which contains the id number of the obs that being matched with this obs. Since we match each ob in the treated group with 1 ob in the control group, the value of _n1 for obs in control group is missing.
for example:
firm A (treated) has _id=668 is paired with firm B (control) with _id=48
therefore, Firm A value for _n1 =48 while firm B has value for _n1=.

In this case what should I do?
I'm thinking that I should create a variable that specify pair id, so that both firm A and B, since being paired with each other, will have the same pair id. However I'm still struggling with that. Can you give me some hints?
Thank you
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

07 Oct 2016, 00:52

So this should do it:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(_id _n1)
32 31
28 27
17 18
 2  1
18 17
25 26
 9 10
12 11
15 16
 5  6
21 22
29 30
19 20
23 24
37 38
36 35
13 14
35 36
33 34
39 40
27 28
34 33
26 25
20 19
38 37
10  9
16 15
31 32
30 29
40 39
 7  8
24 23
 8  7
 4  3
 1  2
11 12
 6  5
 3  4
14 13
22 21
end

gen temp1 = min(_id, _n1)
gen temp2 = max(_id, _n1)
by temp1 temp2, sort: assert _N == 2
by temp1 temp2: gen pair_id = (_n == 1)
replace pair_id = sum(pair_id)
drop temp1 temp2

Note: In the toy data in that example, the pairings are 1 with 2 and 2 with 1, 3 with 4 and 4 with 3, etc. But the code in no way relies on that and will work generally.

Comment

Mia Pham

Join Date: Mar 2015

Posts: 44
#5

07 Oct 2016, 01:15

Thank you so much

I try your code and it works smoothly if a control is being matched with only one treated.
however, in my match, firm B (control) could be matched with both firm A and C (treated)
For example, in the toy data, if 11 with 12 and 22 with 12.
In this case there is an error:

by temp1 temp2, sort: assert _N == 2
522 contradictions in 522 by-groups
assertion is false
r(9);

How should I solve this?
thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#6

07 Oct 2016, 11:13

This gets a bit more complicated and I don't want to try to write code based on imaginary data. Please use -dataex- to post a small representative sample of your data. I only need the _id and _n variables.
Comment
Mia Pham

Join Date: Mar 2015

Posts: 44
#7

07 Oct 2016, 17:40

Sorry for the inconvenience. Below is the data sample.

Thank you.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(_id _n1) 25 78 46 78 95 86 34 89 12 92 26 41 78 . 86 . 89 . 92 . 41 . 32 51 51 . end

Last edited by Mia Pham; 07 Oct 2016, 18:01.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

07 Oct 2016, 18:22

So it appears in your data that _n1 is sometimes missing, but that values of _n1 can be linked to more than one value of _id. By contrast, _id is never missing, and no value of _id is ever duplicated. Relying on this assumption being true throughout your data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(_id _n1)
25 78
46 78
95 86
34 89
12 92
26 41
78  .
86  .
89  .
92  .
41  .
32 51
51  .
end

isid _id //    VERIFY ASSUMPTION

drop if missing(_n1)
by _n1 (_id), sort: gen _j = _n
reshape wide _id, i(_n1) j(_j)
isid _n1
gen long tuple_id = _n
rename _n1 _id0
reshape long _id, i(tuple_id) j(_j)
drop if missing(_id)
drop _j
sort _id
order _id, first

should do it.

Comment

Mia Pham

Join Date: Mar 2015

Posts: 44
#9

07 Oct 2016, 19:22

I'm sorry for confusing you. Let me try to clarify it.
_id is unique and never missing
for obs that has treat=1, we need to find a _id with treat=0 to match with this.
obs with treat=0, however, is a control group, and we don't need to find its pair.
Therefore _n1 of obs with treat=0 is missing
For example, in the first line: _id=12 (treat 1) is matched with _id =92 , therefore _n1 is 92
if you look for line 12th; you can find that _id=92 has treat=0 and _n1=.

In your code, _n1 missing is dropped out, but I need to keep them in my sample so that, for example, _id 12 has pairid 123, then _92 also has pairid 123

Thank you so much

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(_id _n1 treat) 12 92 1 25 78 1 26 41 1 32 51 1 34 89 1 41 . 0 46 78 1 51 . 0 78 . 0 86 . 0 89 . 0 92 . 0 95 86 1 end
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#10

07 Oct 2016, 23:10

In your code, _n1 missing is dropped out, but I need to keep them in my sample so that, for example, _id 12 has pairid 123, then _92 also has pairid 123

But all the _id's are kept in the sample. Look carefully at the output it produces. Each value that appears in _n1, except the missing values, also appears as a value of _id (but associated with a missing value of _n1). That is preserved in my code. At the end of the code, you have a list of all _id numbers and an associated tuple-id (not pair, since sometimes there are multiple matches). To follow on your example, if you look at the output my code generates, _id 12 and _id 92 both appear there, and both have tuple_id 6 associated with them.

This is the layout you will need for your analysis. The _n1's serve no purpose when you get to the analysis. The data must have a variable (tuple_id) which distinguishes the various matched pairs (and triples and higher order tuples) for grouping purposes. This code creates it. And no _id is left out. (Well, an _id would be left out if it is never matched to any other, but then you don't have a matched pair or tuple for it to participate in.)
Comment
Mia Pham

Join Date: Mar 2015

Posts: 44
#11

08 Oct 2016, 01:27

Thank you for your helpful explanation. I got it now. Thank you so much.
Have a nice weekend.
Comment
Sarah Graf

Join Date: Jun 2019

Posts: 7
#12

23 Jun 2019, 09:14

first of all: Many thanks to Clyde much for the responses. This has moved forward my analysis quite a bit!

my question
Do these mixed effects models make use of the paired structure of the data, which was created through matching? I.e. are the results similar to those using matched-pair differences (Y_di = Y_lj- Y_2j and X_dj = X_lj- X_2jwith j identifying the pair)?

background
I am looking at the effects of a rice farming method called SRI in my data set. To do so I am using a data set containing observational data of smallholder farms with each observation representing one farm. I have already identified factors influencing the adoption of SRI using a logit model and used psmatch2 to match households according to those variables.
I used the mahalinobis option to match households that are actually similar with regards to the matching variable to generate a fully blocked randomized pseudo-experiment. psmatch2 reports average treatment effects. However, I also want to report coefficients for covariates to show how this method effects different kinds of households differently (e.g. those using mechanization, those hiring external labour), following a suggestion by Rubin (1979) on combining matching with the use of regression adjustment.

something (potentially) useful from my side:
I have already calculated a pair_id (called block) and I am including my code below. As it is a bit simpler and does not necessitate reshaping, it might be useful for people less experienced with STATA and those working with large data sets that include too many variables to be reshaped.

*generate blocks from output
gen block=.
replace block =_id if _treated==0
replace block= _n1 if _treated==1
replace block=. if _weight==.

*check blocks
sort block
browse SRI _treated _id _n1 block _weight
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#13

23 Jun 2019, 12:00

my question
Do these mixed effects models make use of the paired structure of the data, which was created through matching? I.e. are the results similar to those using matched-pair differences (Y_di = Y_lj- Y_2j and X_dj = X_lj- X_2jwith j identifying the pair)?

By including a random intercept at the pair-id-level, the models are an appropriate matched-pair analysis for multi-level data. They are conceptually similar to doing paired t-tests when there are no covariates and no nesting involved, although they do not produce exactly the same results that a paired t-test (which is the same as a 1-sample t-test of the paired differences) would.
Comment
Mariam Fatehi

Join Date: Jul 2019

Posts: 1
#14

25 Jul 2019, 19:38

Hi, I would need your help with analyzing my data after propensity score matching. In my study, the outcome (y) is continuous, treatment (t) is binary, and covariates (x) includes all continuous, binary and categorical.
What I have done up to now is:

teffects psmatch (y) (t x₁x₂ x₃x₄ x₅ …. x₁₀)

The result shows the number of obs=7,288, min=1, and mix= 5.

Then, I examined overlap and balances:

teffects overlap
tebalance summarize
tebalance density
tebalance box

There is no issue with them.
now I want to run a regression to test the effect of the treatment, but I do not know how to run it. In my data browser, there are no new variables to indicate the matched cases or new id (as Mia said above). I use STATA 15.1.
Could you please suggest how I can figure out this problem?
Thank you so much in advance,
Comment
Sarah Graf

Join Date: Jun 2019

Posts: 7
#15

30 Aug 2019, 14:14

Hi Miriam,
In case you are still pondering: You could use the psmatch2 command instead of teffects psmatch. Then you should be able to use the procedure I described above.
Comment

Announcement