Tom Guidry posted this on Sandbox:
Some small points:
1. The example data are helpful, but use dataex (SSC) to make it even better. Dates are a pain to re-create unless you do that.
2. Below I use numdate (SSC) to create the daily dates.
The big idea is to create a running tally of how often you hit a new date for each identifier. Score 1 for a date unlike that in the previous observation and then get a cumulative sum.
The calculation reproduces what you want, modulo a small slip in your example.
I am using Stata 12 on Windows 8.1. I would like to create the variable seq in the sample data below. For each date, the sequence should be the same, 1 for the first date, 2 for the second date, etc for each ptid. The sequence should start over with each new ptid. i have tried generating seq with egen and grouping ptid and date, but the sequence does not start over with ptid and by can't be used with egen in version 12. What would be the best way to create seq?
ptid date seq
1 01/01/2016 1
1 01/01/2016 1
1 01/02/2016 2
1 01/02/2016 2
2 02/01/2016 1
2 02/01/2016 1
2 02/02/2016 2
2 02/02/2016 2
2 03/01/2016 3
2 03/01/2016 3
2 03/01/2016 3
3 04/01/2016 1
3 04/02/2016 1
ptid date seq
1 01/01/2016 1
1 01/01/2016 1
1 01/02/2016 2
1 01/02/2016 2
2 02/01/2016 1
2 02/01/2016 1
2 02/02/2016 2
2 02/02/2016 2
2 03/01/2016 3
2 03/01/2016 3
2 03/01/2016 3
3 04/01/2016 1
3 04/02/2016 1
1. The example data are helpful, but use dataex (SSC) to make it even better. Dates are a pain to re-create unless you do that.
2. Below I use numdate (SSC) to create the daily dates.
Code:
clear input ptid str10(s_date) wanted 1 "01/01/2016" 1 1 "01/02/2016" 2 1 "01/02/2016" 2 2 "02/01/2016" 1 2 "02/01/2016" 1 2 "02/02/2016" 2 2 "02/02/2016" 2 2 "03/01/2016" 3 2 "03/01/2016" 3 2 "03/01/2016" 3 3 "04/01/2016" 1 3 "04/02/2016" 1 end numdate daily date = s_date, pattern(MDY) bysort ptid (date) : gen seq = sum(date != date[_n-1]) capture noisily assert wanted == seq list, sepby(ptid) +----------------------------------------------+ | ptid s_date wanted date seq | |----------------------------------------------| 1. | 1 01/01/2016 1 01jan2016 1 | 2. | 1 01/02/2016 2 02jan2016 2 | 3. | 1 01/02/2016 2 02jan2016 2 | |----------------------------------------------| 4. | 2 02/01/2016 1 01feb2016 1 | 5. | 2 02/01/2016 1 01feb2016 1 | 6. | 2 02/02/2016 2 02feb2016 2 | 7. | 2 02/02/2016 2 02feb2016 2 | 8. | 2 03/01/2016 3 01mar2016 3 | 9. | 2 03/01/2016 3 01mar2016 3 | 10. | 2 03/01/2016 3 01mar2016 3 | |----------------------------------------------| 11. | 3 04/01/2016 1 01apr2016 1 | 12. | 3 04/02/2016 1 02apr2016 2 | +----------------------------------------------+
The calculation reproduces what you want, modulo a small slip in your example.
Comment