Hi Statalisters!
Haven't managed to find a solution to this problem online but presume its a fairly straightforward one...
I've merged two datasets based on a unique identifyer. My goal is to go through the successfully merged individuals and check for any false negatives based on there name. The trouble is, the two data sets have frequently inputed names with different spelling, titles, only first name/last name etc.
I would like to create a variable that identifies whether the two name variables share say a string of 5 characters, then 4 characters in common, then 3 characters in common and so on. From there I can manually look over it to identify any irregularities.
I'm also open to any other suggestions you think might be better.
An example of the sort of dataset I'm using is below. Here I would like to identify whether variable name1 and name2 share a common string of 3 characters.
Hope this all makes sense! Thanks all in advance
Chris
Haven't managed to find a solution to this problem online but presume its a fairly straightforward one...
I've merged two datasets based on a unique identifyer. My goal is to go through the successfully merged individuals and check for any false negatives based on there name. The trouble is, the two data sets have frequently inputed names with different spelling, titles, only first name/last name etc.
I would like to create a variable that identifies whether the two name variables share say a string of 5 characters, then 4 characters in common, then 3 characters in common and so on. From there I can manually look over it to identify any irregularities.
I'm also open to any other suggestions you think might be better.
An example of the sort of dataset I'm using is below. Here I would like to identify whether variable name1 and name2 share a common string of 3 characters.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int uid str13 name1 str15 name2 1 "Mr John Smith" "Jon smith" 14 "Shanon Russel" "Shannon Russell" 22 "Tim Clyde" "Tim Clyde" 56 "Jeremy " "Jerimy Blaine" 76 "Fiona Jones" "David Blake" 39 "Sian" "Sean" 104 "Nancy Tugwell" "Nat Togwel" 8 "Marry Ann" "Mrs Ann" 145 "W Blaire" "Darren Blaire" 120 "Md Smith" "Md Duncan Smith" end
Hope this all makes sense! Thanks all in advance
Chris
Comment