I have a dataset with the start & end dates of different events across regions. Within region, a major event constitutes of different events happening in sequence. I identify events happening in sequence if the end date of a given event = start date of another event (i.e., same day) OR start date of another event + 1 (i.e., next day). I am trying to create a code that identifies major events (within regions).
In another post, Andrew Musau helped me find a solution of this exercise while considering data at the same level (no region grouping) and that subsequent events are only the ones where the end date and start date matches perfectly (I added two layers of complexity here, stated above in bold): https://www.statalist.org/forums/for...-any-other-row
I have created a toy example of a dataset with three variables (region, start, end).
I would like to give each row, within a region, an id that is kept constant if the value of `start` coincides with `end` OR `end+1` in *any other row*.
The output I would be looking for is the following:
In another post, Andrew Musau helped me find a solution of this exercise while considering data at the same level (no region grouping) and that subsequent events are only the ones where the end date and start date matches perfectly (I added two layers of complexity here, stated above in bold): https://www.statalist.org/forums/for...-any-other-row
I have created a toy example of a dataset with three variables (region, start, end).
Code:
clear input str1 region float(var1 var2) "A" 1 10 "A" 5 9 "A" 6 11 "A" 10 16 "A" 16 17 "A" 16 18 "B" 1 30 "B" 2 29 "B" 30 32 "B" 31 38 "B" 33 33 "B" 35 38 "C" 7 9 "C" 2 7 "C" 7 11 "C" 10 11 "C" 6 8 "C" 6 9 "C" 50 50 "C" 50 57 end list, sepby(region)
The output I would be looking for is the following:
Code:
clear input str1 region float(var1 var2 NEW) "A" 1 10 1 "A" 10 16 1 "A" 16 17 1 "A" 16 18 1 "A" 5 9 2 "A" 6 11 3 "B" 1 30 1 "B" 30 32 1 "B" 31 38 1 "B" 33 33 1 "B" 2 29 2 "B" 35 38 3 "C" 2 7 1 "C" 7 9 1 "C" 7 11 1 "C" 10 11 1 "C" 6 8 2 "C" 6 9 3 "C" 50 50 4 "C" 50 57 4 end list, sepby(region)
Comment