merging data

ishak celik

Join Date: Nov 2021

Posts: 11
#1

merging data

07 Mar 2022, 04:03

Hello, I have two data sets. Each data set has two common variables but not the case (ID) number. How can I merge these two data sets based on two of these common variables.? (The original data set is separated into several packages, I would like to merge two packages again)
Tags: None
Daniel Feenberg

Join Date: Oct 2014

Posts: 321
#2

07 Mar 2022, 06:26

If x and y combined uniquely identify each case, and dataset2 is in core and sorted by x and y then

Code:

use dataset1 sort x y merge 1:1 x y using dataset2

but if x and y are sex and race, obviously they won't uniquely identify each case and the merge will fail. See
https://www.stata.com/manuals/u23.pdf for the -merge- command.
Comment
ishak celik

Join Date: Nov 2021

Posts: 11
#3

07 Mar 2022, 07:25

I tried but got a message : "variables x y do not uniquely identify observations in the master data" in this case I used gender and class level of the students. But it did not help, what can be the reason for this? Thanks..
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9948
#4

07 Mar 2022, 07:54

Use

Code:

merge m:1

or

Code:

merge 1:m

You should not have duplicates of both variables at least in one dataset. If you do, you need to think more about how you want to combine the datasets.
Comment
ishak celik

Join Date: Nov 2021

Posts: 11
#5

09 Mar 2022, 07:44

Ok, in two datasets I have two same variables. it was actually one dataset. But now dataset is divided into two. I am missing variables in the main data set. The ather dataset has variables that I need. between two datasets I have two common variables (var_x and var_y) . In this case how to merge these two datasets to one? So I can use the other variables that I need in the second dataset.. Make this sense? thank you...
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9948
#6

09 Mar 2022, 07:56

As I said in #4, as long as there are no duplicates in the using dataset:

Code:

use dataset1 sort x y merge m:1 x y using dataset2

Otherwise, see why you have duplicates:

Code:

use dataset2, clear duplicates tag x y, gen(dup) sort x y list if dup, sepby(x y)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment