Processing hours variables in daily survey data

Garrett Baker

Join Date: Jun 2024

Posts: 26
#1

Processing hours variables in daily survey data

13 Jun 2024, 14:11

Hi Statalist, I have survey data organized at the ‘person-day’ level over the course of about 1 month, with information on their start and stop times at their job.

My goal is to produce a dataset at the ‘person-day of week’ level, with a column capturing what hours they worked at least once during that month on a given day of the week. To take a simple example, if someone worked from 09:00-15:00 on the first Friday we observe them, and then the next Friday they worked from 13:00-17:00, then in the new ‘person-day of week’ dataset, that person would have a value of 10 hours (09:00-17:00 inclusive) as the set of hours they were ever observed working on Fridays.

For more context, daily surveys were sent to respondents at 19:00 and respondents had 24 hours to answer that survey. These surveys asked people to report the start and stop times of the shift they worked that day. We have the date each survey was sent out (survey_date) and date-time stamps for when they actually responded (survey_complete). For example, they might respond at 19:05 on day t that they worked 10:00-16:00, in which case we know that those hours were worked on the same day that the survey responses were given (day t). Most of the time, this is the way it works and is mostly straightforward.

Here's a more detailed example of the ‘unique hours’ calculation (based on the data below). On Sundays (dayofwk ==0), person_id 600 worked:
On 8/11, from 06:00-10:00

On 8/18, from 06:00-14:00

On 8/25, from 17:00-20:00

On 9/1, from 10:00-16:00

This person would have 06:00-16:00 and 17:00-20:00 as times ever worked on Sundays, for a total of 13 unique hours worked. My first question is, how can I create create a new dataset that contains unique hours worked for each day of the week?

However, one significant challenge arises when one’s start and stop times aren’t contained within the same day, which can happen in two ways:

(1) In the example data below, person_id 600 indicates on 8/12 (a Monday) that they worked from 22:00-04:00, in which case 22:00-23:59 was actually worked on day t-1 (on 8/11, Sunday) and 00:00-04:00 was worked on day t (8/12). This would mean we’d add 22:00-23:59 as time worked on Sunday, making the unique Sunday times (from the above example) now 06:00-16:00, 17:00-20:00, and 22:00-23:59. It would also mean Monday’s times ever worked start with 00:00-04:00.

(2) In the example data below, person_id 600 indicates on 9/3 (a Tuesday), which is day t+1 relative to the survey_date of 9/2, that they worked 21:00-05:00. In this case, we’d consider their shift to have started on Monday 9/2, and add 21:00-23:59 to Mondays, bringing Monday’s total hours ever worked to 00:00-04:00 and 22:00-23:59. 00:00-05:00 would be applied to Tuesday’s times ever worked.

My second question is, how can I recode these times into Stata date-time format in a way that makes sure the start/stop times that occur on days other than the survey date get coded correctly, and then incorporate them into the new dataset?

So to repeat, in the end I’d ideally like a dataset that is organized at the (person)X(day-of-week) level, that tells me what unique range(s) of times on that day the person ever worked. Thank you very much for any help!

My example data:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte dayofwk int(person_id survey_date) double(survey_complete job1_actualstartnum_time) byte job1_actualstopnum_time 0 600 21772 1881169199103.9995 6 10 0 600 21779 1881831243776 6 14 0 600 21786 1882378993663.9998 17 20 0 600 21793 1882984415231.9998 10 16 0 655 21793 1882987298816.0002 6 14 0 655 21800 1883595603968 6 14 0 655 21807 1884193816576 6 14 0 655 21814 1884805267455.9998 . . 0 655 21821 1885410426880 . . 1 600 21773 1881256099840.0002 22 4 1 600 21787 1882466942975.9995 . . 1 600 21794 1883106530000.0002 21 5 1 655 21794 1883070267391.9995 . . 1 655 21801 1883675295743.9998 . . 1 655 21808 1884285829120.0002 6 14 1 655 21815 1884884828160 6 14 1 655 21822 1885489987583.9998 6 14 2 600 21774 1881349292000.0002 11 19 2 600 21781 1881947242495.9998 6 14 2 600 21788 1882560397311.9998 5.5 15 2 600 21795 1883165818879.9998 10 19 2 655 21795 1883156643840 . . 2 655 21802 1883761672192.0002 . . 2 655 21809 1884399730688 6 14 2 655 21816 1884971597824 . . 3 600 21775 1881429508096.0005 7 16 3 600 21782 1882083688448.0002 7 15 3 600 21789 1882647166975.9998 . . 3 600 21796 1883286142976.0002 6 14 3 655 21796 1883245903872.0002 6 14 3 655 21803 1883848835072.0002 6 14 3 655 21810 1884452814848.0005 6 14 3 655 21817 1885060464640 6 14 4 600 21769 1880931725000 6 14 4 600 21776 1881566216192 7 15 4 600 21783 1882173734912 . . 4 600 21790 1882724892672 9 17 4 600 21797 1883337129984 6 14 4 655 21797 1883329527808.0002 6 14 4 655 21804 1883934425087.9998 9 17 4 655 21811 1884546662400.0002 9 17 4 655 21818 1885144743936.0002 9 17 5 600 21770 1880996577279.9998 . . 5 600 21777 1881614450688.0002 11 19 5 600 21784 1882257752064.0002 6 14 5 600 21791 1882811531264.0002 9 19 5 600 21798 1883416821760 9 19 5 655 21798 1883420491776 6 14 5 655 21805 1884028534784.0002 9 17 5 655 21812 1884650515000 . . 5 655 21819 1885230596096.0002 6 14 6 600 21771 1881086623744.0002 . . 6 600 21778 1881687851008.0002 23 3 6 600 21785 1882292617216 . . 6 655 21799 1883506606080.0002 6 14 6 655 21806 1884107177984 9 17 6 655 21813 1884712075264 . . 6 655 21820 1885317758976.0002 22 3 end format %tdnn/dd/CCYY survey_date format %tcnn/dd/ccYY_hh:MM survey_complete
Tags: None

Announcement

Processing hours variables in daily survey data