psmatch2 (avoiding matching within the same team) : how do I do propensity matching with conditions?

olivia kim

Join Date: Dec 2024

Posts: 4
#1

psmatch2 (avoiding matching within the same team) : how do I do propensity matching with conditions?

17 Dec 2024, 07:16

Hi,

I have a question about avoiding "within" matching in PSMATCH2. It seems that there are many posts about within, but not about avoiding within. (Please let me know if there are!! It would be so helpful).

My data looks like the following basically...

As you can see, a team has multiple observations, let's call them dyads. Ideally, I would like to calculate pscore for each dyad and match across teams not within. So even if row 1-3 are close in pscore, they should not be matched because they are all dyads of team A. How do I avoid matching within team?
Team Collaborators Treatment Team's collaborator pick var3....

A Tom 1 1

A John 0 1

A Sam 0 0

B Tom 0 0

B John 1 1

B Sam 1 1

Thank you for your help!
Tags: None

Felix Bittmann

Join Date: Aug 2018
Posts: 693

17 Dec 2024, 10:48

This is an interesting question. I am not sure if there is a quick or elegant way. I have attempted a somewhat crude solution using kmatch.

Code:

ssc install kmatch, replace

I find up to n nearest neighbors and then select a match that is not in the same group. Here is a toy example where industry is the grouping where I want to avoid matches within.

Code:

*** Prepare toy dataset ***
clear all
sysuse nlsw88
keep if inlist(industry, 1, 2, 3, 5)
keep union wage hours industry
drop if missing(union, wage, hours)


*** Create Unique ID for each observation ***
sort industry
gen ID = _n


*** Find up to 5 matches in the dataset using kmatch ***
global treatment union
global controls wage hours
kmatch ps $treatment $controls, nn(5) idgenerate


*** For each group, create ID bounds ***
bysort industry (ID): egen p_low = min(ID)
bysort industry (ID): egen p_high = max(ID)


*** Find matches that fall outside the own groups using these bounds ***
gen finalmatch_ID = .
forvalues i = 1/5 {
    replace finalmatch_ID = _ID_`i' if missing(finalmatch_ID) & !inrange(_ID_`i', p_low, p_high)
}

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Comment

olivia kim

Join Date: Dec 2024

Posts: 4
#3

18 Dec 2024, 12:03

@Flex Bittmann Thank you for your input!

I will try what you shared.. I have a quick clarifying question, please let me know if I got this incorrectly..
Your 'for loop' finds matches that fall outside of the own groups within 5 matches, BUT does not necessarily find the closest one (in terms of pscore) out of those 5. Is this correct?
Comment

Felix Bittmann

Join Date: Aug 2018
Posts: 693

19 Dec 2024, 00:33

This is actually a very good question. You can use options generate and dxgenerate to also inspect individual PS values. I am not 100% sure about this, but I think the matches are ordered by PS difference, so match1 has the smallest value and so on. However, as we are interested in distances, I think that absolute values are more important for our goal. I have adapted the code to change the order of matches and sort them by absolute PS distance between two matches. Now the code should select the closest match, based on PS.

Code:

*** Prepare toy dataset ***
clear all
sysuse nlsw88
keep if inlist(industry, 1, 2, 3, 5)
keep union wage hours industry
drop if missing(union, wage, hours)


*** Create Unique ID for each observation ***
sort industry, stable
gen ID = _n


*** Find up to 5 matches in the dataset using kmatch ***
global treatment union
global controls wage hours
kmatch ps $treatment $controls, nn(5) generate idgenerate dxgenerate


*** For each group, create ID bounds ***
bysort industry (ID): egen p_low = min(ID)
bysort industry (ID): egen p_high = max(ID)


*** Sort matches by absolute PS difference ***
preserve
keep ID _ID* _DX*
reshape long _ID_ _DX_, i(ID) j(match)
replace _DX_ = abs(_DX_)
sort ID _DX_, stable
bysort ID: gen newmatch = _n
drop match
reshape wide _ID_ _DX_, i(ID) j(newmatch)
order ID _ID* _DX*
tempfile sortedmatches
save `sortedmatches', replace
restore
drop _ID_* _DX_*
merge 1:1 ID using `sortedmatches', nogen


*** Find matches that fall outside the own groups using these bounds ***
gen finalmatch_ID = .
forvalues i = 1/5 {
    replace finalmatch_ID = _ID_`i' if missing(finalmatch_ID) & !inrange(_ID_`i', p_low, p_high)
}

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Team	Collaborators	Treatment	Team's collaborator pick	var3....
A	Tom	1	1
A	John	0	1
A	Sam	0	0
B	Tom	0	0
B	John	1	1
B	Sam	1	1

Announcement