Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I merge CUSIP codes in two datasets when some companies don't have CUSIP?

    Hello,
    I'm sort of new to Stata. I want to merge two datasets, one from SDC and the other from Worldscope annuals. The SDC file only has CUSIP codes of acquirer and target companies. I have to merge it with my Worldscope file. I want to do the merging using CUSIP codes but some companies don't have CUSIP. What can I do?

  • #2
    The -merge- command does not require that all of the observations involved find matches in the other data set. I don't know what SDC and Worldscope* are, but it sounds like the Worldscope data set has CUSIPs for a large set of companies, whereas the SDC file only has CUSIP codes for a proper subset of them. But that does not stop you from -merge-ing them. Just do it. The result will be that the companies that appear in both files will have the information from both files, but those that are in Worldscope only will have only the information to be found in Worldscope, and the variables that are unique to SDC will have missing values.

    Now, if for your subsequent analyses, the SDC file variables are needed, then those companies that don't appear in the SDC file will automatically be excluded from these analyses. And if that is a severe problem that prevents you from carrying out a suitable analysis, then all that can be said is that you can't achieve your goals with this data. You will either need to find a dataset that includes everything from SDC that you need but also covers the companies in Worldscope, or you will have to revise your research goals to something feasible with the data available to you.

    *Please read the Forum FAQ, as everyone is asked to do before posting, for excellent advice on how to ask questions in ways that maximize your chances of getting timely and helpful responses. In particular, this is a multidisciplinary international Forum. It may be that in your circle everybody knows what SDC and Worldscope are, but that is not true of this Forum. It is inadvisable to use specialized jargon or abbreviations when posting here. You should use only words that would be understandable to any person with university level education in any subject/field/discipline. If it is necessary to introduce some more specialized terms to adequately explain your question, you should take a sentence or two to explain what those terms mean. It was possible for me to respond to your question only because it appears that the question is not really about those two particular data sets, but is rather a generic question about -merge- and data sets that only partially overlap.

    Comment


    • #3
      As Clyde suggests, your question is more about the databases than it is about Stata's merge command. I don't know much about these databases, but this guide says Worldscope covers firms domiciled outside the United States. I presume SDC includes non-US firms on one side of the merger or the other. I think only firms listed on North American exchanges have CUSIPs. There are probably identifiers in each database for non-American firms, but of course they may not be the same. Perhaps you can find a mapping somewhere.
      Devra Golbe
      Professor Emerita, Dept. of Economics
      Hunter College, CUNY

      Comment


      • #4
        Thanks Derva and Clyde. I started over with knowing more about the structure of dataset and now I have another question in this regard: When I generate 6-digit cusips out of 9-digit cusips, the cusips of some firms become identical. for example, there is this 9-digit cusip: 000225102 and another one: 000225109. As you can see, only the last digit is different.
        In this case, there will be false duplicates in my file and I have to keep both because they definitely belong to different firms. What is your suggestion in this matter? Highly appreciate your help.

        Comment


        • #5
          Hi Mahtab,

          Again, this is more likely a data question than a Stata question. That said, I don't know how you got two observations with CUSIPs that differ only by check digit or why you think they "definitely belong to different firms." These two CUSIPs should refer to the same firm, and the same security. The first six digits refer to the issuing firm, the next two to the particular security, and the 9th is a check digit. See this page, for example, for an explanation of CUSIPs.
          Devra Golbe
          Professor Emerita, Dept. of Economics
          Hunter College, CUNY

          Comment


          • #6
            Hi Devra,

            Thanks a lot for your response. I will re-check the codes. Maybe I made a mistake when checking them in the big file. So, I need to get back to them.

            Comment

            Working...
            X