Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge with Cusip and daydate

    Hello Stata Community,

    I would like to thank all of you for the helpful comments and posts that I have been using for the past two years and today I registered as a member.

    Issue: I am trying to merge two datasets based on cusip and daydate (MM/DD/YY). Both of my datasets have unique identifiers csuip and daydate with the same cusip length; 9digit and same day format; M/D/Y When I tried to merge based on 1:1, 1:M, OR M:1 , it shows the error message "variables cusip daydate do not uniquely identify observations in the master data".

    Please note that I have either one or many cusip in each dataset for each firm because each company has several events on different dates. So, I am trying to match that date and the company based on the cusip and date to run an event study.

    Cusip in the second dataset was retrieved from SAS date with shows the number of dates but I converted all that on Excel and I think it was done well after randomly checking.

    Below is a short data for this issue

    the first dataset

    Cusip Date
    172967424 7/8/2014
    172967424 1/25/2008
    172967424 6/19/2008
    172967424 10/14/2010
    172967424 6/9/2011
    172967424 3/13/2012
    172967424 4/18/2018
    172967424 4/2/2014
    172967424 11/13/2014
    172967424 10/2/2006
    172967424 8/9/2007
    172967424 2/24/2010
    172967424 7/17/2013
    The second dataset
    6/9/2011 172967424
    6/10/2011 172967424
    6/11/2011 172967424
    6/13/2011 172967424
    6/16/2011 172967424
    6/17/2011 172967424
    6/24/2011 172967424
    6/27/2011 172967424
    8/5/2011 172967424
    2/4/2012 172967424
    9/21/2012 172967424
    10/31/2012 172967424
    7/15/2013 172967424
    9/4/2013 172967424
    1/23/2014 172967424
    Thank you all in advance.

  • #2
    So, you need to identify the observations that have the same values of cusip and daydate and figure out what to do with them.

    The first step is to look at them, in both data sets:
    Code:
    dupliciates tag cusip daydate, gen(flag)
    browse if flag
    The simplest case will be that it is apparent that these observations are completely duplicate: they agree not just on cusip and daydate, but on all the variables. In that case, you can just get rid of the duplicates and retain only one from each such set. (BUT, remember that these purely duplicate observations really shouldn't be there--the fact that they are suggests that something went wrong in the data management that produced the data. So it behooves you to review that data management to see what went wrong--you may find other, more consequential, errors as well, and you should take the opportunity to fix them.)

    However you may find that the observations with any given cusip and daydate value do not necessarily agree on all other variables. In that case there are a few possibilities. It may be that in each such set of observations, only one (or perhaps none) are correct and the others are data errors. Your task then is to find out which of the observations has the correct data, and remove the others. (And, again, apart from purging the data set, you should also review the data management that produced it because clearly something went wrong and there may be other more serious problems still lurking.) Another possibility is that all of these observations are correct but they refer to different things: perhaps to different subdivisions of the firm, or to different kinds of events. In this situation, your task is to find, or create, one or more variables that distinguish these observations, and then add those additional variables to the merge key: then you will be able to use -merge- sensibly.

    There is one other possibility: perhaps all of the observations are correct data, even though they are not all in agreement, and there is no other data to distinguish them. In that case, these data sets cannot be merged. Think about it. If the first data set has two observations for Cusip X daydate Y and the second data set also has two observations for that cusip and daydate, how would you, or Stata, know which observation from the data set to pair with each of the observations in the other data set? Now, sometimes, the answer is that you would pair both observations of the second data set with both observations of the first. If that is what you would do, you can do that, but the command for that is not -merge-, it's -joinby-.

    Comment

    Working...
    X