Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching based on characteristics

    Hello,

    I have a dataset with firms and employees. This dataset consists of 2 different subsets. Those subsets each contain the same firms but with different firmIDs.
    I have now appended the dataset to contain both subsets.
    It looks like this
    firmID1 employeeID1 firmID2 employeeID2 birthyear sex startdate enddate
    1000 1 . . 1977 male 01jan1990 31dec1995
    . . 10001 101 1977 male 01jan1990 31dec1995
    1000 2 . . 1965 female 01mar1991 31jul1994
    . . 10001 102 1945 male 01sep1992 31nov1997
    Now I want to match those 2 firms together. But since they do not contain the same employees and just have some overlappings, I can't match them directly. That is why I want to use some kind of probability matching.
    Is there any way to do this? I thought about propensity score matching, but I am not sure how to implement it in this situation.

    The final results should look like this
    firmIDfinal employeeIDfinal firmID1 employeeID1 firmID2 employeeID2 birthyear sex startdate enddate
    2000 10 1000 1 10001 101 1977 male 01jan1990 31dec1995
    2000 11 1000 2 10001 . 1965 femaile 01mar1991 31jul1994
    2000 12 1000 . 10001 102 1945 male 01sep1992 31nov1997
    This results should be based on some correlation between the 2 firms in the dataset.




    Best,

    Jakob


  • #2
    There are several community-contributed packages for this kind of "record linkage" that you want. I've only used them a little, but take a look at -ssc describe reclink- and -ssc describe reclink2-.
    Last edited by Mike Lacy; 22 Oct 2022, 12:54.

    Comment

    Working...
    X