Hello,
I have a dataset with firms and employees. This dataset consists of 2 different subsets. Those subsets each contain the same firms but with different firmIDs.
I have now appended the dataset to contain both subsets.
It looks like this
Now I want to match those 2 firms together. But since they do not contain the same employees and just have some overlappings, I can't match them directly. That is why I want to use some kind of probability matching.
Is there any way to do this? I thought about propensity score matching, but I am not sure how to implement it in this situation.
The final results should look like this
This results should be based on some correlation between the 2 firms in the dataset.
Best,
Jakob
I have a dataset with firms and employees. This dataset consists of 2 different subsets. Those subsets each contain the same firms but with different firmIDs.
I have now appended the dataset to contain both subsets.
It looks like this
firmID1 | employeeID1 | firmID2 | employeeID2 | birthyear | sex | startdate | enddate |
1000 | 1 | . | . | 1977 | male | 01jan1990 | 31dec1995 |
. | . | 10001 | 101 | 1977 | male | 01jan1990 | 31dec1995 |
1000 | 2 | . | . | 1965 | female | 01mar1991 | 31jul1994 |
. | . | 10001 | 102 | 1945 | male | 01sep1992 | 31nov1997 |
Is there any way to do this? I thought about propensity score matching, but I am not sure how to implement it in this situation.
The final results should look like this
firmIDfinal | employeeIDfinal | firmID1 | employeeID1 | firmID2 | employeeID2 | birthyear | sex | startdate | enddate |
2000 | 10 | 1000 | 1 | 10001 | 101 | 1977 | male | 01jan1990 | 31dec1995 |
2000 | 11 | 1000 | 2 | 10001 | . | 1965 | femaile | 01mar1991 | 31jul1994 |
2000 | 12 | 1000 | . | 10001 | 102 | 1945 | male | 01sep1992 | 31nov1997 |
Best,
Jakob
Comment