Matching based on characteristics

Jakob Conradi

Join Date: May 2020
Posts: 9

Matching based on characteristics

22 Oct 2022, 11:37

Hello,

I have a dataset with firms and employees. This dataset consists of 2 different subsets. Those subsets each contain the same firms but with different firmIDs.
I have now appended the dataset to contain both subsets.
It looks like this

firmID1	employeeID1	firmID2	employeeID2	birthyear	sex	startdate	enddate
1000	1	.	.	1977	male	01jan1990	31dec1995
.	.	10001	101	1977	male	01jan1990	31dec1995
1000	2	.	.	1965	female	01mar1991	31jul1994
.	.	10001	102	1945	male	01sep1992	31nov1997

Now I want to match those 2 firms together. But since they do not contain the same employees and just have some overlappings, I can't match them directly. That is why I want to use some kind of probability matching.
Is there any way to do this? I thought about propensity score matching, but I am not sure how to implement it in this situation.

The final results should look like this

firmIDfinal	employeeIDfinal	firmID1	employeeID1	firmID2	employeeID2	birthyear	sex	startdate	enddate
2000	10	1000	1	10001	101	1977	male	01jan1990	31dec1995
2000	11	1000	2	10001	.	1965	femaile	01mar1991	31jul1994
2000	12	1000	.	10001	102	1945	male	01sep1992	31nov1997

This results should be based on some correlation between the 2 firms in the dataset.

Best,

Jakob

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

22 Oct 2022, 12:48

There are several community-contributed packages for this kind of "record linkage" that you want. I've only used them a little, but take a look at -ssc describe reclink- and -ssc describe reclink2-.

Last edited by Mike Lacy; 22 Oct 2022, 12:54.
Comment

Announcement

Matching based on characteristics

Comment