Dear community,
I have the following challenge when trying to merge two datasets. Dataset 1: observations are identified by cikcode (company id), year, and director name (Name). Dataset 2: observations are identified by cikcode, year, and Name. I need to merge these two datasets based on cikcode, year, and Name. The challenge is that Name of the person from the Dataset 1 is not necessary written exactly the same as Name of the same person in the Dataset 2. So, essentially, I need to do two types of matching at once: exact matching by cikcode and year plus fuzzy matching by Name.
Below is a data sample from Dataset 1 and also from Dataset 2. For example, cikcode 1750, year 2011, Name "ronwoodard" from Dataset 1 should be matched to cikcode 1750, year 2011, Name "ronald b woodard" from Dataset 2.
Original datasets have over 100,000 observations, so it's not feasible to perform all the matches manually...
Sincerely appreciate any advice and help!
I have the following challenge when trying to merge two datasets. Dataset 1: observations are identified by cikcode (company id), year, and director name (Name). Dataset 2: observations are identified by cikcode, year, and Name. I need to merge these two datasets based on cikcode, year, and Name. The challenge is that Name of the person from the Dataset 1 is not necessary written exactly the same as Name of the same person in the Dataset 2. So, essentially, I need to do two types of matching at once: exact matching by cikcode and year plus fuzzy matching by Name.
Below is a data sample from Dataset 1 and also from Dataset 2. For example, cikcode 1750, year 2011, Name "ronwoodard" from Dataset 1 should be matched to cikcode 1750, year 2011, Name "ronald b woodard" from Dataset 2.
Original datasets have over 100,000 observations, so it's not feasible to perform all the matches manually...
Sincerely appreciate any advice and help!
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long cikcode double year str30 Name str256 DirectorName 1750 2010 "davidstorch" "David Storch" 1750 2010 "mikeboyce" "Mike Boyce" 1750 2010 "generalronfogleman" "General Ron Fogleman" 1750 2010 "patkelly" "Pat Kelly" 1750 2010 "jamesbrocksmithjr" "James Brocksmith Jr" 1750 2010 "marcwalfish" "Marc Walfish" 1750 2010 "ronwoodard" "Ron Woodard" 1750 2010 "geraldfitzgeraldjr" "Gerald Fitzgerald Jr" 1750 2010 "jimgoodwin" "Jim Goodwin" 1750 2010 "normanbobins" "Norman Bobins" 1750 2011 "generalpeterpace" "General Peter Pace" 1750 2011 "jimgoodwin" "Jim Goodwin" 1750 2011 "mikeboyce" "Mike Boyce" 1750 2011 "generalronfogleman" "General Ron Fogleman" 1750 2011 "normanbobins" "Norman Bobins" 1750 2011 "marcwalfish" "Marc Walfish" 1750 2011 "patkelly" "Pat Kelly" 1750 2011 "ronwoodard" "Ron Woodard" 1750 2011 "jamesbrocksmithjr" "James Brocksmith Jr" 1750 2011 "davidstorch" "David Storch" 1750 2012 "generalpeterpace" "General Peter Pace" 1750 2012 "patkelly" "Pat Kelly" 1750 2012 "normanbobins" "Norman Bobins" 1750 2012 "jimgoodwin" "Jim Goodwin" 1750 2012 "generalronfogleman" "General Ron Fogleman" 1750 2012 "davidstorch" "David Storch" 1750 2012 "ronwoodard" "Ron Woodard" 1750 2012 "jamesbrocksmithjr" "James Brocksmith Jr" 1750 2012 "marcwalfish" "Marc Walfish" 1750 2012 "mikeboyce" "Mike Boyce" 1750 2013 "davidstorch" "David Storch" 1750 2013 "generalpeterpace" "General Peter Pace" 1750 2013 "mikeboyce" "Mike Boyce" 1750 2013 "generalronfogleman" "General Ron Fogleman" 1750 2013 "normanbobins" "Norman Bobins" 1750 2013 "marcwalfish" "Marc Walfish" 1750 2013 "jimgoodwin" "Jim Goodwin" 1750 2013 "ronwoodard" "Ron Woodard" 1750 2013 "tonyanderson" "Tony Anderson" 1750 2013 "patkelly" "Pat Kelly" 1800 2010 "royroberts" "Roy Roberts" 1800 2010 "billdaley" "Bill Daley" 1800 2010 "edliddy" "Ed Liddy" 1800 2010 "samscottiii" "Sam Scott III" 1800 2010 "jimfarrell" "Jim Farrell" 1800 2010 "roxanneaustin" "Roxanne Austin" 1800 2010 "billsmithburg" "Bill Smithburg" 1800 2010 "harryfuller" "Harry Fuller" 1800 2010 "glenntilton" "Glenn Tilton" 1800 2010 "billosborn" "Bill Osborn" 1800 2010 "phebenovakovic" "Phebe Novakovic" 1800 2010 "mileswhite" "Miles White" 1800 2010 "professordoctorrobert" "Professor Doctor Robert Alpern" 1800 2011 "billosborn" "Bill Osborn" 1800 2011 "jimfarrell" "Jim Farrell" 1800 2011 "edliddy" "Ed Liddy" 1800 2011 "harryfuller" "Harry Fuller" 1800 2011 "samscottiii" "Sam Scott III" 1800 2011 "professordoctorrobert" "Professor Doctor Robert Alpern" 1800 2011 "roxanneaustin" "Roxanne Austin" 1800 2011 "phebenovakovic" "Phebe Novakovic" 1800 2011 "glenntilton" "Glenn Tilton" 1800 2011 "mileswhite" "Miles White" 1800 2012 "sallyblount" "Sally Blount" 1800 2012 "edliddy" "Ed Liddy" 1800 2012 "jimfarrell" "Jim Farrell" 1800 2012 "phebenovakovic" "Phebe Novakovic" 1800 2012 "mileswhite" "Miles White" 1800 2012 "samscottiii" "Sam Scott III" 1800 2012 "billosborn" "Bill Osborn" 1800 2012 "roxanneaustin" "Roxanne Austin" 1800 2012 "glenntilton" "Glenn Tilton" 1800 2012 "professordoctorrobert" "Professor Doctor Robert Alpern" 1800 2012 "nancymckinstry" "Nancy McKinstry" 1800 2013 "samscottiii" "Sam Scott III" 1800 2013 "billosborn" "Bill Osborn" 1800 2013 "jimfarrell" "Jim Farrell" 1800 2013 "edliddy" "Ed Liddy" 1800 2013 "glenntilton" "Glenn Tilton" 1800 2013 "roxanneaustin" "Roxanne Austin" 1800 2013 "phebenovakovic" "Phebe Novakovic" 1800 2013 "mileswhite" "Miles White" 1800 2013 "sallyblount" "Sally Blount" 1800 2013 "professordoctorrobert" "Professor Doctor Robert Alpern" 1800 2013 "nancymckinstry" "Nancy McKinstry" 2488 2010 "johncaldwell" "John Caldwell" 2488 2010 "nickdonofrio" "Nick Donofrio" 2488 2010 "craigconway" "Craig Conway" 2488 2010 "heberhart" "H Eberhart" 2488 2010 "doctormikebarnes" "Doctor Mike Barnes" 2488 2010 "bruceclaflin" "Bruce Claflin" 2488 2010 "doctorbobpalmer" "Doctor Bob Palmer" 2488 2011 "craigconway" "Craig Conway" 2488 2011 "doctormikebarnes" "Doctor Mike Barnes" 2488 2011 "johncaldwell" "John Caldwell" 2488 2011 "doctorbobpalmer" "Doctor Bob Palmer" 2488 2011 "bruceclaflin" "Bruce Claflin" 2488 2011 "heberhart" "H Eberhart" 2488 2011 "nickdonofrio" "Nick Donofrio" 2488 2012 "jackharding" "Jack Harding" end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long cikcode float year strL Name 1750 2011 "ronald b woodard" 1750 2011 "peter pace" 1750 2011 "ronald r fogleman" 1750 2011 "patrick j kelly" 1750 2012 "michael r boyce" 1750 2012 "anthony k anderson" 1750 2012 "david p storch" 1750 2013 "norman r bobins" 1750 2013 "marc j walfish" 1750 2013 "ronald r fogleman" 1750 2013 "james e goodwin" 1961 2010 "bernard stolar" 1961 2010 "jay coleman" 1961 2010 "thomas kidrin" 1961 2010 "robert fireman" 1961 2013 "bernard stolar" 1961 2013 "thomas kidrin" 1961 2013 "robert fireman" 2034 2010 "michael feinman" 2034 2010 "hans c noetzli 458" 2034 2010 "frank debenedittis" 2034 2010 "vincent g miata" 2034 2010 "vincent g miata 1" 2034 2010 "richard p randall 467" 2034 2010 "brian shapiro" 2034 2010 "robert a wiesen 38" 2034 2010 "jan van eis" 2034 2010 "william n britton 26" 2034 2010 "douglas roth" 2034 2010 "ulf bender" 2034 2010 "raymond bartone" 2034 2010 "albert l eilender" 2034 2010 "albert l eilender 1" 2034 2010 "roger weaving jr" 2034 2011 "guillaume saint-clair" 2034 2011 "raymond bartone" 2034 2011 "william n britton 15" 2034 2011 "ulf bender" 2034 2011 "steven rogers" 2034 2011 "robert a wiesen 27" 2034 2011 "salvatore guccione*" 2034 2011 "nicholas shackley" 2034 2011 "douglas roth" 2034 2011 "frank debenedittis" 2034 2011 "ronald gold" 2034 2011 "jan van eis" 2034 2011 "roger g weaving jr" 2034 2011 "hans c noetzli 3457" 2034 2011 "richard p randall 36" 2034 2011 "albert l eilender" 2034 2011 "natasha giordano" 2034 2011 "david b rosen" 2034 2011 "albert l eilender" 2034 2011 "michael feinman" 2034 2012 "albert l eilender*" 2034 2012 "douglas roth" 2034 2012 "guillaume saint-clair" 2034 2012 "ronald gold" 2034 2012 "salvatore guccione*" 2034 2012 "albert l eilender*" 2034 2012 "jan van eis" 2034 2012 "david b rosen" 2034 2012 "william n britton 36" 2034 2012 "charles j alaimo" 2034 2012 "richard p randall 17" 2034 2012 "roger g weaving jr" 2034 2012 "robert a wiesen 27" 2034 2012 "steven rogers" 2034 2012 "natasha giordano5" 2034 2012 "salvatore guccione*" 2034 2012 "hans c noetzli 345" 2034 2012 "nicholas shackley" 2034 2012 "frank debenedittis" 2034 2012 "lukas von hippel" 2034 2012 "raymond bartone" 2034 2013 "satish srinivasan" 2034 2013 "richard p randall 1" 2034 2013 "frank debenedittis" 2034 2013 "douglas roth" 2034 2013 "ronald gold" 2034 2013 "salvatore guccione" 2034 2013 "salvatore guccione" 2034 2013 "raymond bartone" 2034 2013 "nicholas shackley" 2034 2013 "roger g weaving jr" 2034 2013 "natasha giordano57" 2034 2013 "alan g levin" 2034 2013 "david b rosen" 2034 2013 "albert l eilender" 2034 2013 "hans c noetzli 234" 2034 2013 "charles j alaimo" 2034 2013 "albert l eilender" 2034 2013 "steven rogers" 2034 2013 "william n britton 36" 2110 2010 "p zachary egan41" 2110 2010 "david c kleinman74 trustee" 2110 2010 "louis j mendes45" 2110 2010 "robert a mohn48" 2110 2010 "steven n kaplan50trustee and vice chairman" 2110 2010 "charles p mcquaid56trustee and president" end
Comment