Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with 1:1 merging of two longitudinal data sets

    Hi,
    I have 2 datasets that needs to be matched/merged. Basically, the two variables I need matched are company codes (coded as GVKEY) and fiscal year.
    I.e data set nr 1 is smaller, and I only want the observations that is in data set nr 1. Data set nr 2 I want additional variables, but only for the observations in data set 1 (called Links):
    GVKEY FYEAR CUST_GVKEY ..variable.. ..variable... ...variable...
    1004 1991 #
    1004 1992 #
    1004 1993

    1009 1991
    1009 1992

    2023 1993
    2023 1994


    Data set nr 2(Called Fundamentals):
    GVKEY FYEAR EBIT EBITDA ...variable... ...variable...
    1002 1991 # #
    1002 1992 # #
    1002 1993

    1004 1991
    1004 1992
    1004 1993

    1009 1991
    1009 1992

    2023 1993
    2023 1994

    3432 1991
    3432 1992

    So I basically want the information contained in data set 2 in data set 1, but only for the specific companies listed in data set 1. I.e say that data set 1 contains only 40 000 firms (individual GVKEYS) and data set 2 has 200 000 different GVKEYS, I only want the information on the 40 000 firms.

    So what i did is the following:
    use Links.dta

    *Once I have the master file open I type:
    merge 1:1 GVKEY FYEAR using Fundamentals

    I then get an error saying: variables gvkey fyear do not uniquely identify observations in the master data
    r(459);


    Is it because I have the the same GVKEY numbers stated on several lines?
    Can someone help? I am going crazy!

  • #2
    Your sample looks fine, so at a guess the problem lies elsewhere. Try duplicates, e.g.

    Code:
    duplicates list gvkey fyear
    (You missed the request to use full real names, so please consider contacting the administrators to change your registration. See "Contact us" on the home page.)

    Comment


    • #3
      Thank you! Duplicates was the problem. However this complicates the matter as I need the duplicates at a later period. This due to an additional variable named CUST_GVKEY. Basically this variable states the major customer. If one company(as referred to by GVKEY) has two major customers, then the variables gvkey and fear will show up twice after each other. The only difference between the two observations is the variable CUST_GVKEY which will have two different codes. Is it possible to perform the merge and just accept that it will be duplicated?

      Comment


      • #4
        That is the wrong way to think, trying to pretend that a problem doesn't exist. In any case, 1:1 merges need uniqueness; that's what 1:1 means!

        The best way forward for you may be a merge on gvkey cust_gvkey fyear.

        Comment


        • #5
          The problem I see then is if on the other dataset cust_gvkey doesn't exist, and the records are uniquely specified by gvkey fyear. You may need to do a merge m:1 in this case.
          Alfonso Sanchez-Penalver

          Comment


          • #6
            Alfonso: Good point.

            Comment


            • #7
              Thanks Nick! Too many years programming database applications and merging tables using SQL must have served for something
              Alfonso Sanchez-Penalver

              Comment

              Working...
              X