Jorrit, I think that we are talking pass each other. My point from the get go is that there are no easy solutions if there are more than one changes of identifier per product. Here's the same problem as with #1 with an extra line added on top and with the following one modified. The observations made in #1 hold in the sense that AAAAA and D27AF are to be grouped since they both share the same value for id2 in observations 1 and 2. This is no different from the case where B78 spans id values of A12AB and A74ZF. The code D27AF matches two id2 codes: A00 and A12 the same way as the code 112 groups 3 id2 codes: BD18, BD32, and BD64. Given this setup, group_id will correctly group observations under a new identifier whereas your solution will leave AAAAA orphaned:
and the results:
When there are more than one change, you must group the first change first, go over the results and note that you must further group for the second change, and so on. Again, these types of problems are tricky and group_id will correctly group identifiers within groups defined by the matchby() option.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte cluster int year str5 id str4 id2 int price int quant 1 2007 "AAAAA" "A00" 11 666 1 2008 "D27AF" "A00" 12 700 1 2009 "D27AF" "A12" 14 724 1 2011 "D27AF" "A12" 8 824 1 2013 "D27AF" "A12" 6 900 1 2015 "D27AF" "A12" 4 1024 1 2007 "A12AB" "B78" 75 84 1 2009 "A12AB" "B78" 24 88 1 2011 "A12AB" "B78" 132 120 1 2013 "A12AB" "B78" 75 400 1 2015 "A74ZF" "B78" 12 280 9 2004 "112" "BD18" 89 45 9 2006 "112" "BD64" 89 78 9 2008 "112" "BD18" 89 800 9 2010 "112" "BD18" 67 824 9 2012 "112" "BD32" 50 825 9 2014 "112" "BD18" 34 890 9 2016 "112" "BD18" 23 1000 end * Jorrit's solution gen idnew = id bys id2 (year): replace idnew = id[_N] gen id2new = id2 bys id (year): replace id2new = id2[_N] gen id3 = idnew + "/" + id2new * redo using group_id egen newid = group(cluster id) group_id newid, matchby(cluster id2) sort cluster newid id3 year list, sepby(cluster newid)
Code:
. list, sepby(cluster newid) +------------------------------------------------------------------------------------+ | cluster year id id2 price quant idnew id2new id3 newid | |------------------------------------------------------------------------------------| 1. | 1 2007 A12AB B78 75 84 A74ZF B78 A74ZF/B78 1 | 2. | 1 2009 A12AB B78 24 88 A74ZF B78 A74ZF/B78 1 | 3. | 1 2011 A12AB B78 132 120 A74ZF B78 A74ZF/B78 1 | 4. | 1 2013 A12AB B78 75 400 A74ZF B78 A74ZF/B78 1 | 5. | 1 2015 A74ZF B78 12 280 A74ZF B78 A74ZF/B78 1 | |------------------------------------------------------------------------------------| 6. | 1 2007 AAAAA A00 11 666 D27AF A00 D27AF/A00 3 | 7. | 1 2008 D27AF A00 12 700 D27AF A12 D27AF/A12 3 | 8. | 1 2009 D27AF A12 14 724 D27AF A12 D27AF/A12 3 | 9. | 1 2011 D27AF A12 8 824 D27AF A12 D27AF/A12 3 | 10. | 1 2013 D27AF A12 6 900 D27AF A12 D27AF/A12 3 | 11. | 1 2015 D27AF A12 4 1024 D27AF A12 D27AF/A12 3 | |------------------------------------------------------------------------------------| 12. | 9 2004 112 BD18 89 45 112 BD18 112/BD18 5 | 13. | 9 2006 112 BD64 89 78 112 BD18 112/BD18 5 | 14. | 9 2008 112 BD18 89 800 112 BD18 112/BD18 5 | 15. | 9 2010 112 BD18 67 824 112 BD18 112/BD18 5 | 16. | 9 2012 112 BD32 50 825 112 BD18 112/BD18 5 | 17. | 9 2014 112 BD18 34 890 112 BD18 112/BD18 5 | 18. | 9 2016 112 BD18 23 1000 112 BD18 112/BD18 5 | +------------------------------------------------------------------------------------+
Comment