I have been told and seen a whole lot of posts that merging m:m is just a horrible idea. But nobody has seemed to be able to explain clearly why that is.
My master data set has one observation per patent and the using data set is a census dataset. I am trying to find as many possible matches as possible using the sound of the inventors name, initials, city, etc. I can't really collapse my first data set into individuals since I don't know if two people with the same name are actually the same person, so I don't really want to wrangle my data to do a merge 1:m, although I guess I theoretically could. I have been told that joinby is the best solution, which is great. I ran them both (joinby and merge m:m) and the merge returned more matches than joinby.
So ultimately my question is what is the difference and why is merge m:m a problem?
My master data set has one observation per patent and the using data set is a census dataset. I am trying to find as many possible matches as possible using the sound of the inventors name, initials, city, etc. I can't really collapse my first data set into individuals since I don't know if two people with the same name are actually the same person, so I don't really want to wrangle my data to do a merge 1:m, although I guess I theoretically could. I have been told that joinby is the best solution, which is great. I ran them both (joinby and merge m:m) and the merge returned more matches than joinby.
So ultimately my question is what is the difference and why is merge m:m a problem?
Comment