Dear all,
I am working on a merged dataset containing multiple lines per patient with different start/end dates.
When there are overlaps in dates, I would like to add rows in order to get chronological and sequential observations.
In the example below, the patient was seen from 3-oct-08 to 25-jan-12 (record_id=1) and 8 times between these dates (8 consecutives rows with source=0).
What I would like to achieve is as below
I started with the code below:
(1) generate the min & max dates to get the boundaries of dates
(2) generate the variable toexpand
(3) expand
Sorry for the non-Stataish syntax, I believe there is a smarter way to generate the variable toexpand.
Form there, I am stuck, and can't figure out how to continue.
Any help will be appreciated,
Thank you
I am working on a merged dataset containing multiple lines per patient with different start/end dates.
When there are overlaps in dates, I would like to add rows in order to get chronological and sequential observations.
In the example below, the patient was seen from 3-oct-08 to 25-jan-12 (record_id=1) and 8 times between these dates (8 consecutives rows with source=0).
Code:
id start end source record_id min max toexpand 804-32 03-Oct-08 25-Jan-12 1 1 03-Oct-08 25-Jan-12 9 804-32 16-Jan-09 30-Jan-09 0 2 03-Oct-08 25-Jan-12 1 804-32 27-Aug-09 01-Sep-09 0 3 03-Oct-08 25-Jan-12 1 804-32 03-Oct-09 06-Oct-09 0 4 03-Oct-08 25-Jan-12 1 804-32 19-Aug-10 23-Aug-10 0 5 03-Oct-08 25-Jan-12 1 804-32 12-Sep-10 15-Sep-10 0 6 03-Oct-08 25-Jan-12 1 804-32 04-May-11 06-May-11 0 7 03-Oct-08 25-Jan-12 1 804-32 21-Jul-11 24-Jul-11 0 8 03-Oct-08 25-Jan-12 1 804-32 19-Aug-11 21-Aug-11 0 9 03-Oct-08 25-Jan-12 1 804-32 25-Jan-12 03-Oct-13 1 10 03-Oct-08 03-Oct-13 .
What I would like to achieve is as below
Code:
id start end source 804-32 03-Oct-08 16-Jan-09 1 804-32 16-Jan-09 30-Jan-09 0 804-32 30-Jan-09 27-Aug-09 1 804-32 27-Aug-09 01-Sep-09 0 804-32 01-Sep-09 03-Oct-09 1 804-32 03-Oct-09 06-Oct-09 0 804-32 06-Oct-09 19-Aug-10 1 804-32 19-Aug-10 23-Aug-10 0 804-32 23-Aug-10 12-Sep-10 1 804-32 12-Sep-10 15-Sep-10 0 804-32 15-Sep-10 04-May-11 1 804-32 04-May-11 06-May-11 0 804-32 06-May-11 21-Jul-11 1 804-32 21-Jul-11 24-Jul-11 0 804-32 24-Jul-11 19-Aug-11 1 804-32 19-Aug-11 21-Aug-11 0 804-32 21-Aug-11 25-Jan-12 1 804-32 25-Jan-12 03-Oct-13 1
(1) generate the min & max dates to get the boundaries of dates
(2) generate the variable toexpand
(3) expand
Code:
by id : gen record_id=_n sort idrecord_id ***generate minimum & maximum dates of the previous observation from source=1 by id : gen min = start by id : gen max = end qui forval i = 1/10 { by id: replace min = min(min, start[_n-`i']) by id: replace max = max(max, end[_n-`i']) } format min max %d ***generate the variable toexpand in order to expand the line from source=1 bysort id: gen toexpand = 1 + (end>start[_n+1] & end>end[_n+1]) if id==id[_n+1] replace toexpand=2+(end>start[_n+2] & end>end[_n+2]) if id==id[_n+2] & end>start[_n+2] & end>end[_n+2] replace toexpand=3+(end>start[_n+3] & end>end[_n+3]) if id==id[_n+3] & end>start[_n+3] & end>end[_n+3] replace toexpand=4+(end>start[_n+4] & end>end[_n+4]) if id==id[_n+4] & end>start[_n+4] & end>end[_n+4] replace toexpand=5+(end>start[_n+5] & end>end[_n+5]) if id==id[_n+5] & end>start[_n+5] & end>end[_n+5] replace toexpand=6+(end>start[_n+6] & end>end[_n+6]) if id==id[_n+6] & end>start[_n+6] & end>end[_n+6] replace toexpand=7+(end>start[_n+7] & end>end[_n+7]) if id==id[_n+7] & end>start[_n+7] & end>end[_n+7] replace toexpand=8+(end>start[_n+8] & end>end[_n+8]) if id==id[_n+8] & end>start[_n+8] & end>end[_n+8] replace toexpand=9+(end>start[_n+9] & end>end[_n+9]) if id==id[_n+9] & end>start[_n+9] & end>end[_n+9] ***expand rows expand toexpand drop toexpand sort id start end duplicates tag id start end, gen (dup)
Form there, I am stuck, and can't figure out how to continue.
Any help will be appreciated,
Thank you
Comment