Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • finding and tagging duplicate values of firms in panel data considering ascending order of year!!!!

    Hello everyone,
    I am working on panel data. I need to find and tag duplicate values in variable "date" by ISIN and also i want to tag them first or second duplicate year wise (e.g. I want "duph1"= 1 if the duplicate value of "date" is in 2006 of ISIN1 and "duph1" =2 if duplicate value of "date" is in year 2007 of firm1). I used following two codes to tag them.
    The first code is stata code which doesn't work as I want, it tags duplicates based on number of occurrence not order of occurrence: AS YOU CAN SEE IN THE DATAEX example in variable "duph".
    Code:
    duplicates tag ISIN date, gen(duph)
    the second code does what I want but it does not take into consideration the ascending order of year:
    Code:
    sort ISIN date
    quietly by ISIN date: gen duph1 = cond(_N==1,0,_n)
    e.g as you can see following observation in bold. there are two observations for ISIN(BMG7300G1096) which are repeated in variable date, 1 & 2. However, for the first 1 duph1=1 while for first 2 duph1=2. I cannot sort data ISIN, year and date wise(all three variables) in the above code as it doesn't find any duplicates that way.
    I do not want to drop duplicates.
    Kindly suggest me some solution
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 ISIN int year float date byte duph float duph1
    "BMG4209G2077" 2006 1 1 1
    "BMG4209G2077" 2007 2 1 1
    "BMG4209G2077" 2012 . . 3
    "BMG4209G2077" 2013 . . 2
    "BMG4209G2077" 2015 . . 1
    "BMG4209G2077" 2017 1 1 2
    "BMG4209G2077" 2018 2 1 2
    "BMG7300G1096" 2008 1 1 1
    "BMG7300G1096" 2009 2 1 2
    "BMG7300G1096" 2011 . . 2
    "BMG7300G1096" 2012 . . 1
    "BMG7300G1096" 2015 1 1 2
    "BMG7300G1096" 2016 2 1 1
    "CY0107130912" 2006 1 1 2
    "CY0107130912" 2007 . . 1
    "CY0107130912" 2008 . . 3
    "CY0107130912" 2009 . . 4
    "CY0107130912" 2010 . . 2
    "CY0107130912" 2012 1 1 1
    "CY0107130912" 2013 2 0 0
    "DE0005167902" 2007 1 1 1
    "DE0005167902" 2008 2 1 2
    "DE0005167902" 2011 . . 1
    "DE0005167902" 2012 . . 2
    "DE0005167902" 2013 . . 3
    "DE0005167902" 2014 . . 4
    "DE0005167902" 2015 1 1 2
    "DE0005167902" 2016 2 1 1
    "DE0005198907" 2006 1 1 2
    "DE0005198907" 2007 . . 1
    "DE0005198907" 2008 . . 3
    "DE0005198907" 2009 . . 2
    "DE0005198907" 2010 1 1 1
    "DE0005198907" 2012 2 0 0
    "DE0005632160" 2006 1 1 2
    "DE0005632160" 2007 2 1 1
    "DE0005632160" 2009 . . 0
    "DE0005632160" 2012 1 1 1
    "DE0005632160" 2013 2 1 2
    "DE0006853005" 2006 1 1 2
    "DE0006853005" 2008 2 0 0
    "DE0006853005" 2011 . . 1
    "DE0006853005" 2012 . . 2
    "DE0006853005" 2013 . . 4
    "DE0006853005" 2014 . . 3
    "DE0006853005" 2016 1 1 1
    "DE0006924400" 2007 1 1 2
    "DE0006924400" 2008 2 1 2
    "DE0006924400" 2009 . . 2
    "DE0006924400" 2010 . . 1
    "DE0006924400" 2012 1 1 1
    "DE0006924400" 2013 2 1 1
    "DE0007314007" 2006 1 1 1
    "DE0007314007" 2007 2 1 2
    "DE0007314007" 2008 . . 1
    "DE0007314007" 2011 . . 2
    "DE0007314007" 2012 1 1 2
    "DE0007314007" 2017 2 1 1
    "DE0007454209" 2010 1 1 2
    "DE0007454209" 2011 . . 0
    "DE0007454209" 2015 1 1 1
    "DE0007454209" 2016 2 0 0
    "DE0007551400" 2007 1 1 1
    "DE0007551400" 2008 2 1 2
    "DE0007551400" 2010 . . 3
    "DE0007551400" 2011 . . 2
    "DE0007551400" 2012 . . 1
    "DE0007551400" 2013 1 1 2
    "DE0007551400" 2014 2 1 1
    "DE0007830572" 2006 1 1 1
    "DE0007830572" 2007 2 1 2
    "DE0007830572" 2013 . . 2
    "DE0007830572" 2014 . . 1
    "DE0007830572" 2015 . . 3
    "DE0007830572" 2016 1 1 2
    "DE0007830572" 2018 2 1 1
    "DE0008103102" 2006 1 1 2
    "DE0008103102" 2007 2 1 1
    "DE0008103102" 2012 . . 1
    "DE0008103102" 2014 . . 2
    "DE0008103102" 2015 1 1 1
    "DE0008103102" 2016 2 1 2
    "DE000A0JKHC9" 2006 1 1 1
    "DE000A0JKHC9" 2008 . . 2
    "DE000A0JKHC9" 2009 . . 3
    "DE000A0JKHC9" 2010 . . 1
    "DE000A0JKHC9" 2011 1 1 2
    "DE000A0JKHC9" 2012 2 0 0
    "DE000A0JM2F5" 2006 1 1 2
    "DE000A0JM2F5" 2008 2 0 0
    "DE000A0JM2F5" 2009 . . 2
    "DE000A0JM2F5" 2010 . . 1
    "DE000A0JM2F5" 2011 . . 3
    "DE000A0JM2F5" 2012 . . 4
    "DE000A0JM2F5" 2013 1 1 1
    "DE000A0JQ5U3" 2006 1 1 2
    "DE000A0JQ5U3" 2008 2 0 0
    "DE000A0JQ5U3" 2016 . . 1
    "DE000A0JQ5U3" 2017 . . 2
    "DE000A0JQ5U3" 2018 1 1 1
    end

  • #2
    Just FYI, you will have more luck of getting an answer if you omit the three exclamation marks from your title. That said, I think this could be solved by some combination of sort and egen <> = tag(<>).

    Comment


    • #3
      Thanks Jesse Wursten I really did not pay attention while posting, I was in hurry actually. and thanks for reply. unfortunately now I cannot edit the title because it cannot be edited after an hour of posting. So, my apologies to all.

      Comment

      Working...
      X