Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Jorrit, I think that we are talking pass each other. My point from the get go is that there are no easy solutions if there are more than one changes of identifier per product. Here's the same problem as with #1 with an extra line added on top and with the following one modified. The observations made in #1 hold in the sense that AAAAA and D27AF are to be grouped since they both share the same value for id2 in observations 1 and 2. This is no different from the case where B78 spans id values of A12AB and A74ZF. The code D27AF matches two id2 codes: A00 and A12 the same way as the code 112 groups 3 id2 codes: BD18, BD32, and BD64. Given this setup, group_id will correctly group observations under a new identifier whereas your solution will leave AAAAA orphaned:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte cluster int year str5 id str4 id2 int price int quant
    1 2007 "AAAAA" "A00"   11 666
    1 2008 "D27AF" "A00"   12 700
    1 2009 "D27AF" "A12"   14 724
    1 2011 "D27AF" "A12"    8 824
    1 2013 "D27AF" "A12"    6 900
    1 2015 "D27AF" "A12"    4 1024
    1 2007 "A12AB" "B78"   75 84
    1 2009 "A12AB" "B78"   24 88
    1 2011 "A12AB" "B78"  132 120
    1 2013 "A12AB" "B78"   75 400
    1 2015 "A74ZF" "B78"   12 280
    9 2004 "112"   "BD18"  89 45
    9 2006 "112"   "BD64"  89 78
    9 2008 "112"   "BD18"  89 800
    9 2010 "112"   "BD18"  67 824
    9 2012 "112"   "BD32"  50 825
    9 2014 "112"   "BD18"  34 890
    9 2016 "112"   "BD18"  23 1000
    end
    
    * Jorrit's solution
    gen idnew = id
    bys id2 (year): replace idnew = id[_N]
    gen id2new = id2
    bys id (year): replace id2new = id2[_N]
    gen id3 = idnew + "/" + id2new
    
    * redo using group_id
    egen newid = group(cluster id)
    group_id newid, matchby(cluster id2)
    
    sort cluster newid id3 year
    list, sepby(cluster newid)
    and the results:
    Code:
    . list, sepby(cluster newid)
    
         +------------------------------------------------------------------------------------+
         | cluster   year      id    id2   price   quant   idnew   id2new         id3   newid |
         |------------------------------------------------------------------------------------|
      1. |       1   2007   A12AB    B78      75      84   A74ZF      B78   A74ZF/B78       1 |
      2. |       1   2009   A12AB    B78      24      88   A74ZF      B78   A74ZF/B78       1 |
      3. |       1   2011   A12AB    B78     132     120   A74ZF      B78   A74ZF/B78       1 |
      4. |       1   2013   A12AB    B78      75     400   A74ZF      B78   A74ZF/B78       1 |
      5. |       1   2015   A74ZF    B78      12     280   A74ZF      B78   A74ZF/B78       1 |
         |------------------------------------------------------------------------------------|
      6. |       1   2007   AAAAA    A00      11     666   D27AF      A00   D27AF/A00       3 |
      7. |       1   2008   D27AF    A00      12     700   D27AF      A12   D27AF/A12       3 |
      8. |       1   2009   D27AF    A12      14     724   D27AF      A12   D27AF/A12       3 |
      9. |       1   2011   D27AF    A12       8     824   D27AF      A12   D27AF/A12       3 |
     10. |       1   2013   D27AF    A12       6     900   D27AF      A12   D27AF/A12       3 |
     11. |       1   2015   D27AF    A12       4    1024   D27AF      A12   D27AF/A12       3 |
         |------------------------------------------------------------------------------------|
     12. |       9   2004     112   BD18      89      45     112     BD18    112/BD18       5 |
     13. |       9   2006     112   BD64      89      78     112     BD18    112/BD18       5 |
     14. |       9   2008     112   BD18      89     800     112     BD18    112/BD18       5 |
     15. |       9   2010     112   BD18      67     824     112     BD18    112/BD18       5 |
     16. |       9   2012     112   BD32      50     825     112     BD18    112/BD18       5 |
     17. |       9   2014     112   BD18      34     890     112     BD18    112/BD18       5 |
     18. |       9   2016     112   BD18      23    1000     112     BD18    112/BD18       5 |
         +------------------------------------------------------------------------------------+
    When there are more than one change, you must group the first change first, go over the results and note that you must further group for the second change, and so on. Again, these types of problems are tricky and group_id will correctly group identifiers within groups defined by the matchby() option.

    Comment


    • #17
      Ah, I see. And this example does indeed give better results with group_id.

      Comment

      Working...
      X