Hello everyone,
I am working with daily survey data (measured over a one month period) that contains information on the time someone started and stopped working their shift at their job. My goal is to clean/process these data such that they can be eventually collapsed to the person-day of week level, with each row corresponding to the number of unique (ie, non-overlapping) hours someone worked across a given day of the week.
The simplest example of how to think about unique hours would be: if someone worked from 9:00 til 15:00 on a Monday, and then the following Monday they worked from 13:00 til 17:00, this would represent 8 unique hours worked across the two Mondays, because we would not want to double-count the 2 overlapping hours (13:00-15:00) across the days.
Below is some initial code I've written for this. I am running into trouble once I start trying to wrangle >2 days (within a given day of the week). The number of potential min/max comparisons gets unwieldy quickly.
Perhaps there is a more concise strategy compared to what I've started. Any ideas or guidance would be much appreciated!
My code up to this point is here:
Example data here:
I am working with daily survey data (measured over a one month period) that contains information on the time someone started and stopped working their shift at their job. My goal is to clean/process these data such that they can be eventually collapsed to the person-day of week level, with each row corresponding to the number of unique (ie, non-overlapping) hours someone worked across a given day of the week.
The simplest example of how to think about unique hours would be: if someone worked from 9:00 til 15:00 on a Monday, and then the following Monday they worked from 13:00 til 17:00, this would represent 8 unique hours worked across the two Mondays, because we would not want to double-count the 2 overlapping hours (13:00-15:00) across the days.
Below is some initial code I've written for this. I am running into trouble once I start trying to wrangle >2 days (within a given day of the week). The number of potential min/max comparisons gets unwieldy quickly.
Perhaps there is a more concise strategy compared to what I've started. Any ideas or guidance would be much appreciated!
My code up to this point is here:
Code:
* Counter for day of week bysort person_id dayofwk (survey_date): gen day_count = _n sort person_id dayofwk survey_date ///// First day (within a given day of week) * Shift length gen shift_length = job1_actualstopnum_time - job1_actualstartnum_time * This is var that will eventually be collapsed and summed gen unique_shift_hours = shift_length if day_count == 1 ///// Second day (within a given day of week) * Apply stop time for day 1 across the other dates within day of week bysort person_id dayofwk: gen stop_time_day1 = job1_actualstopnum_time if day_count == 1 bysort person_id dayofwk: replace stop_time_day1 = stop_time_day1[_n-1] if missing(stop_time_day1) * When day 2 stop time is larger than day 1 stop time, add the resulting gap to unique hours var replace unique_shift_hours = job1_actualstopnum_time - stop_time if day_count == 2 & stop_time < job1_actualstopnum_time * When day 2 start time is smaller than day 1 start time, add the resulting gap to unique hours var bysort person_id dayofwk: gen start_time_day1 = job1_actualstartnum_time if day_count == 1 bysort person_id dayofwk: replace start_time_day1 = start_time_day1[_n-1] if missing(start_time_day1) replace unique_shift_hours = start_time_day1 - job1_actualstartnum_time if day_count == 2 & start_time_day1 > job1_actualstartnum_time & unique_shift_hours == . * Create second unique shift hours var for when day 2 start time is smaller than day 1 start time AND day 2 stop time is larger than day 1 stop time gen unique_shift_hours2 = start_time_day1 - job1_actualstartnum_time if day_count == 2 & (start_time_day1 > job1_actualstartnum_time) & (stop_time_day1 < job1_actualstopnum_time) replace unique_shift_hours = unique_shift_hours + unique_shift_hours2 if unique_shift_hours2 != . & day_count == 2 ///// Third day????
Example data here:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte dayofwk int(person_id survey_date) double(job1_actualstartnum_time job1_actualstopnum_time) 0 600 21772 6 10 0 600 21779 6 14 0 600 21786 17 20 0 600 21793 10 16 1 600 21773 14 17 1 600 21787 11 19 1 600 21794 9 15 2 600 21774 11 19 2 600 21781 6 14 2 600 21788 5.5 15 2 600 21795 10 19 0 655 21793 7 10 0 655 21800 12 18 0 655 21807 9 16 0 655 21814 8 15 0 655 21821 13 19 1 655 21794 6 10 1 655 21801 9 14 1 655 21808 13.5 17 1 655 21815 12.5 17 1 655 21822 . . 2 655 21795 7 10 2 655 21802 12.5 18.5 2 655 21809 6 14 2 655 21816 12 18 end format %tdnn/dd/CCYY survey_date
Comment