How to estimate number of days between the earliest and last dates with gaps and overlaps

HeeJu Jang-Paulsen

Join Date: Dec 2021

Posts: 4
#1

How to estimate number of days between the earliest and last dates with gaps and overlaps

10 Dec 2021, 14:32

Hello all,
This is my first time posting a question on Statalist, and I really need your help!

I have a data like below in which each 'id' repeats for every 'activity' (e.g. job search and readiness, employment) he or she participated.
The variables 'begin_dt' and 'end_dt' correspond to the dates when each activities began and ended.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id begin_dt end_dt activity) str63 type 1 21171 21201 1 "Individual" 1 21564 21608 1 "Individual" 1 21082 21234 2 "Full Time" 1 21171 21201 2 "Part Time" 1 21200 21380 2 "Full Time" 1 21581 21761 2 "Part Time" 1 21586 21601 2 "Full Time" 1 21602 21632 2 "Full Time" 1 21634 21817 2 "Full Time" end format %td begin_dt format %td end_dt label values activity componentlabel label def componentlabel 1 "Job Search and readiness", modify label def componentlabel 2 "Employment", modify

I need to calculate how many days a person participated in each activity, but I do not know how to account for the overlaps and gaps between the earliest begin date and the last end date.

In this example, the earliest begin date for 'employment' was on 20-sep-2017, and the last date, '25-sep-2019'.
However, he or she had gap of about 7 months between 15-jul-2018 and 01-feb-2019. In addition, some begin and end dates overlap.

For this reason, I can't simply subtract the difference between 20-sep-2017 and 25-sep-2019.
Could you please advise me on what to do?

Thanks in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

10 Dec 2021, 15:35

First, thank you for using -dataex- to show example data on your very first post! Much appreciated. And it would not have been possible for me to solve this without that.

If there is any way at all to do this in wide layout (I don't think so), I imagine it would be enormously complicated. So first we go long. Then we sort the data by id and activity, and within that chronologically. The problem of overlap is taken care of by counting up with beginnings and down with endings. A new spell of sustained activity begins with a total of 1 and eventually counts down to 0. So we mark each spell of events and calculate the length of the spell. Note that I take the length of the spell to be the ending date - beginning date + 1. (If I begin today and end tomorrow, that counts as 2 days, for example.) Then for each id and activity we add up the lengths of the spells to get the total number of days in that id spent in that activity. This allows for gaps and does not double-count overlapped days.

Code:

assert begin_dt < end_dt gen long obs_no = _n reshape long @_dt, i(obs_no) j(event) string by id activity (_dt), sort: gen depth = sum(event == "begin") - sum(event == "end") by id activity (_dt): gen spell = sum(depth == 1 & inlist(depth[_n-1], 0, .)) by id activity spell (_dt), sort: gen length = _dt[_N]-_dt[1] + 1 if _n == 1 by id activity: egen duration = total(length)

The code assumes that you just want this duration as a separate variable and you need to retain the original data as well. If you do not need to retain all of the original data and just want a data set with one observation per id#activity, then instead of the final -egen- command, you can

Code:

collapse (sum) duration = length, by(id activity)
1 like
Comment
HeeJu Jang-Paulsen

Join Date: Dec 2021

Posts: 4
#3

10 Dec 2021, 16:05

This is absolutely brilliant.
Thank you so much!!!!
Comment

Announcement

How to estimate number of days between the earliest and last dates with gaps and overlaps

Comment

Comment