Merging two discrete datasets

Chris Rooney

Join Date: Apr 2014

Posts: 167
#1

Merging two discrete datasets

25 Apr 2014, 10:04

Hello all,

My master dataset and the dataset that I want to merge are in discrete time format (multiple records per RandomID [The RandomID is my unique identifier]). The error message that I get is:

"variable RandomID does not uniquely identify observations in the master data"

I've Googled a little, and there doesn't seem much on merging discrete datasets. I thought merge 1:1 would work, but obviously not. The other merging methods (one-to-many and many-to-one) are not suitable for what I want to do.
Tags: None
Brendan Halpin

Join Date: Mar 2014

Posts: 152
#2

25 Apr 2014, 10:10

Are you sure you want to merge, rather than append?
Comment
Chris Rooney

Join Date: Apr 2014

Posts: 167
#3

25 Apr 2014, 10:15

Well as I understand it, merge is to add a new variable to existing observations (which I want to do) while append is to add more observations only.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3405
#4

25 Apr 2014, 10:24

It depends on what you exectly want to do what your data looks like, but given the minimal information you have given us I suspect you want a many to one merge.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Chris Rooney

Join Date: Apr 2014

Posts: 167
#5

25 Apr 2014, 11:06

Right, I figured out I need two unique variables --- luckily I have that! Now an additional question: Is there an easy way to get rid of mismatched variables?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#6

25 Apr 2014, 11:46

Assuming "mismatched variables" is intended to refer to records that occur in one data set but not the other, after -merge- Stata creates a variable called _merge which takes on the values 1 for records that are in the master data only, 2 for records in the using data only, and 3 for records that are found in both data sets (matches). If you want to keep only the matched records you can do that with:

keep if _merge == 3

But merging data sets can be tricky and produce surprising results. It would be better practice to use your understanding of the two data sets to determine in advance what kinds of records you expect to find. For example, if the master data set is a subsample you are studying (but multiple records per participant) and the using data set is a compilation of the entire sample which should include everybody in your sample (but only once), then you would expect only to have two types of result: _merge == 2 or _merge == 3. If there really were a record with _merge == 1 it would imply that your sample contains somebody who is not properly registered. You would want to know that this problem arose, and then do something about it before blundering on analyzing an incorrectly merged data set.

Stata makes this kind of checking easy for you by providing the assert() and keep() options to the merge command. For the example I just gave, your command would be:

merge m:1 id using registry_data_set, assert(match using) keep(match)

and you will then have both the merged data that you want and assurance that it really is what you want.

Strongly suggest you read the full -merge- documentation. It's a very powerful and important command, but also very easy to misuse.
1 like
Comment

Announcement

Merging two discrete datasets

Comment

Comment

Comment

Comment

Comment