Help with 1:1 merging of two longitudinal data sets

Emilie Gronsund

Join Date: Apr 2014

Posts: 26
#1

Help with 1:1 merging of two longitudinal data sets

09 Apr 2014, 07:19

Hi,
I have 2 datasets that needs to be matched/merged. Basically, the two variables I need matched are company codes (coded as GVKEY) and fiscal year.
I.e data set nr 1 is smaller, and I only want the observations that is in data set nr 1. Data set nr 2 I want additional variables, but only for the observations in data set 1 (called Links):
GVKEY FYEAR CUST_GVKEY ..variable.. ..variable... ...variable...
1004 1991 #
1004 1992 #
1004 1993
1009 1991
1009 1992
2023 1993
2023 1994

Data set nr 2(Called Fundamentals):
GVKEY FYEAR EBIT EBITDA ...variable... ...variable...
1002 1991 # #
1002 1992 # #
1002 1993
1004 1991
1004 1992
1004 1993
1009 1991
1009 1992
2023 1993
2023 1994
3432 1991
3432 1992

So I basically want the information contained in data set 2 in data set 1, but only for the specific companies listed in data set 1. I.e say that data set 1 contains only 40 000 firms (individual GVKEYS) and data set 2 has 200 000 different GVKEYS, I only want the information on the 40 000 firms.

So what i did is the following:
use Links.dta

*Once I have the master file open I type:
merge 1:1 GVKEY FYEAR using Fundamentals

I then get an error saying: variables gvkey fyear do not uniquely identify observations in the master data
r(459);

Is it because I have the the same GVKEY numbers stated on several lines?
Can someone help? I am going crazy!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35699
#2

09 Apr 2014, 07:28

Your sample looks fine, so at a guess the problem lies elsewhere. Try duplicates, e.g.

Code:

duplicates list gvkey fyear

(You missed the request to use full real names, so please consider contacting the administrators to change your registration. See "Contact us" on the home page.)
Comment
Emilie Gronsund

Join Date: Apr 2014

Posts: 26
#3

09 Apr 2014, 07:45

Thank you! Duplicates was the problem. However this complicates the matter as I need the duplicates at a later period. This due to an additional variable named CUST_GVKEY. Basically this variable states the major customer. If one company(as referred to by GVKEY) has two major customers, then the variables gvkey and fear will show up twice after each other. The only difference between the two observations is the variable CUST_GVKEY which will have two different codes. Is it possible to perform the merge and just accept that it will be duplicated?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#4

09 Apr 2014, 07:51

That is the wrong way to think, trying to pretend that a problem doesn't exist. In any case, 1:1 merges need uniqueness; that's what 1:1 means!

The best way forward for you may be a merge on gvkey cust_gvkey fyear.
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#5

09 Apr 2014, 08:38

The problem I see then is if on the other dataset cust_gvkey doesn't exist, and the records are uniquely specified by gvkey fyear. You may need to do a merge m:1 in this case.

Alfonso Sanchez-Penalver
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#6

09 Apr 2014, 08:42

Alfonso: Good point.
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#7

09 Apr 2014, 11:35

Thanks Nick! Too many years programming database applications and merging tables using SQL must have served for something

Alfonso Sanchez-Penalver
Comment

Announcement

Help with 1:1 merging of two longitudinal data sets

Comment

Comment

Comment

Comment

Comment

Comment