Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge with only a single identifier?

    Hi all,

    So I have two datasets.
    One is Compustat data from 2006-2012 from WRDS database.
    And the other is Dealscan data also from WRDS.

    And I want to link these together.

    However, my problem is that I am trying to use Roberts Dealscan-Compustat data link table (citation at bottom) but it only provides the GVKEY identifier for the Compustat side.

    This is an issue because I have multi-year (2006-2012) data for Compustat so the GVKEY would be repeated in each year.

    Additionally, the Roberts link table has duplicate GVKEYS because it's linking loans (facilityid identifier), so sometimes companies would have multiple loans.

    You can see some of the Roberts link table in the below dataex:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(facilityid borrowercompanyid) str75 company int facilitystartdate str6 gvkey
      5 3769 "Sage-Allen Co"                        10078 "009362"
     11 3782 "Hanson Trust"                         10105 "012826"
     13    4 "American Community Development Group" 10074 "006300"
     14    4 "American Community Development Group" 10074 "006300"
     16 3788 "Computer Image"                       10116 "003321"
     17 3780 "Integrated Resources"                 10105 "006005"
     18 3780 "Integrated Resources"                 10105 "006005"
     19 3804 "American Cable Systems of California" 10116 "012332"
     21 3799 "Black Hawk Holdings"                  10135 "015242"
     22 3799 "Black Hawk Holdings"                  10135 "015242"
     24 3787 "Consolidated Hydro Inc"               10106 "295785"
     25 3787 "Consolidated Hydro Inc"               10106 "295785"
     26 3787 "Consolidated Hydro Inc"               10105 "295785"
     28 3783 "Funtime Inc"                          10116 "004931"
     29 3783 "Funtime Inc"                          10116 "004931"
     30    7 "MHI Group"                            10105 "007479"
     31 3775 "Owens-Illinois Inc"                   10085 "008215"
     32 3775 "Owens-Illinois Inc"                   10085 "008215"
     33 3775 "Owens-Illinois Inc"                   10085 "008215"
     34 3775 "Owens-Illinois Inc"                   10085 "008215"
     35 3775 "Owens-Illinois Inc"                   10085 "008215"
     36 3775 "Owens-Illinois Inc"                   10085 "008215"
     37 3775 "Owens-Illinois Inc"                   10085 "008215"
    end
    format %td facilitystartdate
    I have no issue merging the Dealscan data to the link table because it has 3 identifiers I can use (facilityid borrowercompanyid facilitystartdate).

    But I can't merge properly from Compustat to the link table, or from the link table (with the Dealscan data) to the Compustat dataset.
    The only possible merge is
    Code:
    merge m:m gvkey using linktable.dta
    but obviously m:m isn't going to work out...

    So how can I merge Roberts link table to my Compustat data with regards to the single identifier GVKEY, which has duplicates (because of multiple loans[facilityid]), and my Compustat data has multiple GVKEYS because it is multi-year data for each firm?

    "How does Financing Impact Investment? The Role of Debt Covenants," with Sudheer Chava, Journal of Finance 2008, 63, 2085 - 2121. (formerly entitled "Is Financial Contracting Costly? An Empirical Analysis of Debt Covenant Violations") http://finance.wharton.upenn.edu/~mr...-12/index.html

  • #2
    You did not get a quick answer. Part of the reason may be that you leave too much to the reader's imagination. There are many members fluent in merging data who have no understanding of your datasources.

    In your example of the data-link table I see the following example of a gvkey
    Code:
    input long(facilityid borrowercompanyid) str75 company int facilitystartdate str6 gvkey
     31 3775 "Owens-Illinois Inc"                   10085 "008215"
     32 3775 "Owens-Illinois Inc"                   10085 "008215"
     33 3775 "Owens-Illinois Inc"                   10085 "008215"
     34 3775 "Owens-Illinois Inc"                   10085 "008215"
     35 3775 "Owens-Illinois Inc"                   10085 "008215"
     36 3775 "Owens-Illinois Inc"                   10085 "008215"
     37 3775 "Owens-Illinois Inc"                   10085 "008215"
    end
    What you need to do is show a dataex of the seven years of Computstat data for this gvkey and a dataex of whatever Dealscan data should match. You don't need to include all the variables in each, but do include all the identifiers like gvkey and faciltyid and borrowercompanyid, and one or two of the "data" variables in each.

    Then using this example, tell us what you want the results of your merging to look like.

    Comment


    • #3
      To add to William's comment, you almost certainly want to merge by company and year. Consequently, you should extract year from the date in the deal file. You may still have a problem if you have more than one deal for company in a year. However, this is something you got to figure out from a research standpoint – what do you do about such multiple deals? Perhaps, you would use collapse to aggregate across the deals by year and company resulting in one observation for each company in each year which can then easily be merged with Compustat.

      Comment

      Working...
      X