Hello!
I have to merge two data sets. They both have the 'name' variable in common. But I need an identifier that will facilitate the matching.
I have two issues with this:
1. The names across data sets are not consistent. In one data set, the name variable might include a middle name and in the other one, it might or not include it. I hope the table below helps illustrate the inconsistency I find between data sets.
*I am thinking that maybe I could extract each name and create three or more variables for each name. And then create an identifier that matches at least two names together (for both data sets)
2. Even if I find a solution to first problem. How can I ensure that the identifiers I create will be consistent across data sets?
Thank you in advance for any tipp on how to solve any of my issues.
I have to merge two data sets. They both have the 'name' variable in common. But I need an identifier that will facilitate the matching.
I have two issues with this:
1. The names across data sets are not consistent. In one data set, the name variable might include a middle name and in the other one, it might or not include it. I hope the table below helps illustrate the inconsistency I find between data sets.
data set 1 | data set 2 |
Name | Name |
AA BB CC | AA CC |
DD EE | DD EE FF |
2. Even if I find a solution to first problem. How can I ensure that the identifiers I create will be consistent across data sets?
Thank you in advance for any tipp on how to solve any of my issues.
Comment