Hi all,
I am not new to Stata, but have never posted on Statalist before, so please forgive if I neglect to include important info. I'm using Stata 15.1 to analyze the American Time Use Data, in a rectangularized form that includes all activities conducted by respondents during the course of a 24-hour period (starting 4 a.m., and finishing 4 a.m. the following day). Each row represents one person's activity, with designated start time, stop time and duration. Each person can have multiple activities (rows) -- there are unique identifiers for person, activity and combined person-activity.
I would like to generate distinct variables for every half-hour increment during the day that represent the duration of time spent in the activity during that interval. Here's an example from the data:
. dataex caseid actline caseactid duration start stop, count(30)
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 30 out of 644891 observations
I initially managed to write a loop that generated binary variables for the presence of the activity, as follows:
local hours1 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 // first 20 hours of the 24-hour period
foreach hr in `hours1' {
gen time1`hr'00=1 if start<=tc(01Jan1960 `hr':00) & stop>tc(01Jan1960 `hr':00)
replace time1`hr'00=0 if time1`hr'00==.
label var time1`hr'00 "Activity (y/n) at `hr':00"
gen time1`hr'30=1 if start<=tc(01Jan1960 `hr':30) & stop>tc(01Jan1960 `hr':30)
replace time1`hr'30=0 if time1`hr'30==.
label var time1`hr'30 "Activity (y/n) at `hr':30"
}
local hours2 00 01 02 03 // these are the last four hours of the 24-hour period
foreach hr in `hours2' {
gen time2`hr'00=1 if start<=tc(02Jan1960 `hr':00) & stop>tc(02Jan1960 `hr':00)
replace time2`hr'00=0 if time2`hr'00==.
label var time2`hr'00 "Activity (y/n) at `hr':00"
gen time2`hr'30=1 if start<=tc(02Jan1960 `hr':30) & stop>tc(02Jan1960 `hr':30)
replace time2`hr'30=0 if time2`hr'30==.
label var time2`hr'30 "Activity (y/n) at `hr':30"
}
This code works, yielding 48 variables (e.g., time10400, time10430), with binary indicator for the activity. However, I did not manage to figure out how to capture duration in distinct interval variables -- i.e., the number of minutes within the interval that were taken up by the activity (e.g., if eating starts at 12:10 and ends at 12:20, then I would want a new variable -- act11200, for example -- to have a value of 10.
I then thought I might be able to use stsplit to yield multiple rows per interval (defining failure as "event=1" for every activity) and tried the following code, with the variable "intval" to capture the duration during the interval, but it yielded an inconsistent number of intervals per activity:
stset stop, id(caseactid) origin(time start) failure(event==1)
stsplit intval, at(0(1800000)86400000)
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 30 out of 1806113 observations
Finally, I am able to reshape to long using the binary variables above, and I suspect I could try a brute-force calculation in that form based on start/stop time (yielding only one additional variable for each interval, of course). That's what I will try next, but thought I'd send my query out into stata-world to see if there's something I'm missing here.
Thoughts on the best way to do this?
Leanne
I am not new to Stata, but have never posted on Statalist before, so please forgive if I neglect to include important info. I'm using Stata 15.1 to analyze the American Time Use Data, in a rectangularized form that includes all activities conducted by respondents during the course of a 24-hour period (starting 4 a.m., and finishing 4 a.m. the following day). Each row represents one person's activity, with designated start time, stop time and duration. Each person can have multiple activities (rows) -- there are unique identifiers for person, activity and combined person-activity.
I would like to generate distinct variables for every half-hour increment during the day that represent the duration of time spent in the activity during that interval. Here's an example from the data:
. dataex caseid actline caseactid duration start stop, count(30)
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double caseid byte actline str16 caseactid int duration float(start stop) 20140101140007 1 "201401011400071" 540 1.44e+07 4.68e+07 20140101140007 2 "201401011400072" 30 4.68e+07 4.86e+07 20140101140007 3 "201401011400073" 60 4.86e+07 5.22e+07 20140101140007 4 "201401011400074" 10 5.22e+07 5.28e+07 20140101140007 5 "201401011400075" 30 5.28e+07 5.46e+07 20140101140007 6 "201401011400076" 20 5.46e+07 5.58e+07 20140101140007 7 "201401011400077" 90 5.58e+07 6.12e+07 20140101140007 8 "201401011400078" 25 6.12e+07 6.27e+07 20140101140007 9 "201401011400079" 30 6.27e+07 6.45e+07 20140101140007 10 "2014010114000710" 605 6.45e+07 1.008e+08 20140101140011 1 "201401011400111" 150 1.44e+07 2.34e+07 20140101140011 2 "201401011400112" 30 2.34e+07 2.52e+07 20140101140011 3 "201401011400113" 30 2.52e+07 2.70e+07 20140101140011 4 "201401011400114" 360 2.70e+07 4.86e+07 20140101140011 5 "201401011400115" 15 4.86e+07 4.95e+07 20140101140011 6 "201401011400116" 105 4.95e+07 5.58e+07 20140101140011 7 "201401011400117" 20 5.58e+07 5.70e+07 20140101140011 8 "201401011400118" 40 5.70e+07 5.94e+07 20140101140011 9 "201401011400119" 120 5.94e+07 6.66e+07 20140101140011 10 "2014010114001110" 20 6.66e+07 6.78e+07 20140101140011 11 "2014010114001111" 60 6.78e+07 7.14e+07 20140101140011 12 "2014010114001112" 30 7.14e+07 7.32e+07 20140101140011 13 "2014010114001113" 40 7.32e+07 7.56e+07 20140101140011 14 "2014010114001114" 30 7.56e+07 7.74e+07 20140101140011 15 "2014010114001115" 60 7.74e+07 8.10e+07 20140101140011 16 "2014010114001116" 20 8.10e+07 8.22e+07 20140101140011 17 "2014010114001117" 40 8.22e+07 8.46e+07 20140101140011 18 "2014010114001118" 270 8.46e+07 1.008e+08 20140101140028 1 "201401011400281" 45 1.44e+07 1.71e+07 20140101140028 2 "201401011400282" 315 1.71e+07 3.60e+07 end format %tcHH:MM start format %tcHH:MM stop
Listed 30 out of 644891 observations
I initially managed to write a loop that generated binary variables for the presence of the activity, as follows:
local hours1 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 // first 20 hours of the 24-hour period
foreach hr in `hours1' {
gen time1`hr'00=1 if start<=tc(01Jan1960 `hr':00) & stop>tc(01Jan1960 `hr':00)
replace time1`hr'00=0 if time1`hr'00==.
label var time1`hr'00 "Activity (y/n) at `hr':00"
gen time1`hr'30=1 if start<=tc(01Jan1960 `hr':30) & stop>tc(01Jan1960 `hr':30)
replace time1`hr'30=0 if time1`hr'30==.
label var time1`hr'30 "Activity (y/n) at `hr':30"
}
local hours2 00 01 02 03 // these are the last four hours of the 24-hour period
foreach hr in `hours2' {
gen time2`hr'00=1 if start<=tc(02Jan1960 `hr':00) & stop>tc(02Jan1960 `hr':00)
replace time2`hr'00=0 if time2`hr'00==.
label var time2`hr'00 "Activity (y/n) at `hr':00"
gen time2`hr'30=1 if start<=tc(02Jan1960 `hr':30) & stop>tc(02Jan1960 `hr':30)
replace time2`hr'30=0 if time2`hr'30==.
label var time2`hr'30 "Activity (y/n) at `hr':30"
}
This code works, yielding 48 variables (e.g., time10400, time10430), with binary indicator for the activity. However, I did not manage to figure out how to capture duration in distinct interval variables -- i.e., the number of minutes within the interval that were taken up by the activity (e.g., if eating starts at 12:10 and ends at 12:20, then I would want a new variable -- act11200, for example -- to have a value of 10.
I then thought I might be able to use stsplit to yield multiple rows per interval (defining failure as "event=1" for every activity) and tried the following code, with the variable "intval" to capture the duration during the interval, but it yielded an inconsistent number of intervals per activity:
stset stop, id(caseactid) origin(time start) failure(event==1)
stsplit intval, at(0(1800000)86400000)
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double caseid byte actline str16 caseactid int duration float(start stop event) byte(_st _d) long(_origin _t _t0) float intval 20140101140007 1 "201401011400071" 540 1.44e+07 1.62e+07 . 1 0 14400000 1800000 0 0 20140101140007 1 "201401011400071" 540 1.44e+07 1.80e+07 . 1 0 14400000 3600000 1800000 1800000 20140101140007 1 "201401011400071" 540 1.44e+07 1.98e+07 . 1 0 14400000 5400000 3600000 3600000 20140101140007 1 "201401011400071" 540 1.44e+07 2.16e+07 . 1 0 14400000 7200000 5400000 5400000 20140101140007 1 "201401011400071" 540 1.44e+07 2.34e+07 . 1 0 14400000 9000000 7200000 7200000 20140101140007 1 "201401011400071" 540 1.44e+07 2.52e+07 . 1 0 14400000 10800000 9000000 9000000 20140101140007 1 "201401011400071" 540 1.44e+07 2.70e+07 . 1 0 14400000 12600000 10800000 1.08e+07 20140101140007 1 "201401011400071" 540 1.44e+07 2.88e+07 . 1 0 14400000 14400000 12600000 1.26e+07 20140101140007 1 "201401011400071" 540 1.44e+07 3.06e+07 . 1 0 14400000 16200000 14400000 1.44e+07 20140101140007 1 "201401011400071" 540 1.44e+07 3.24e+07 . 1 0 14400000 18000000 16200000 1.62e+07 20140101140007 1 "201401011400071" 540 1.44e+07 3.42e+07 . 1 0 14400000 19800000 18000000 1.80e+07 20140101140007 1 "201401011400071" 540 1.44e+07 3.60e+07 . 1 0 14400000 21600000 19800000 1.98e+07 20140101140007 1 "201401011400071" 540 1.44e+07 3.78e+07 . 1 0 14400000 23400000 21600000 2.16e+07 20140101140007 1 "201401011400071" 540 1.44e+07 3.96e+07 . 1 0 14400000 25200000 23400000 2.34e+07 20140101140007 1 "201401011400071" 540 1.44e+07 4.14e+07 . 1 0 14400000 27000000 25200000 2.52e+07 20140101140007 1 "201401011400071" 540 1.44e+07 4.32e+07 . 1 0 14400000 28800000 27000000 2.70e+07 20140101140007 1 "201401011400071" 540 1.44e+07 4.50e+07 . 1 0 14400000 30600000 28800000 2.88e+07 20140101140007 1 "201401011400071" 540 1.44e+07 4.68e+07 1 1 1 14400000 32400000 30600000 3.06e+07 20140101140007 2 "201401011400072" 30 4.68e+07 4.86e+07 1 1 1 46800000 1800000 0 0 20140101140007 3 "201401011400073" 60 4.86e+07 5.04e+07 . 1 0 48600000 1800000 0 0 20140101140007 3 "201401011400073" 60 4.86e+07 5.22e+07 1 1 1 48600000 3600000 1800000 1800000 20140101140007 4 "201401011400074" 10 5.22e+07 5.28e+07 1 1 1 52200000 600000 0 0 20140101140007 5 "201401011400075" 30 5.28e+07 5.46e+07 1 1 1 52800000 1800000 0 0 20140101140007 6 "201401011400076" 20 5.46e+07 5.58e+07 1 1 1 54600000 1200000 0 0 20140101140007 7 "201401011400077" 90 5.58e+07 5.76e+07 . 1 0 55800000 1800000 0 0 20140101140007 7 "201401011400077" 90 5.58e+07 5.94e+07 . 1 0 55800000 3600000 1800000 1800000 20140101140007 7 "201401011400077" 90 5.58e+07 6.12e+07 1 1 1 55800000 5400000 3600000 3600000 20140101140007 8 "201401011400078" 25 6.12e+07 6.27e+07 1 1 1 61200000 1500000 0 0 20140101140007 9 "201401011400079" 30 6.27e+07 6.45e+07 1 1 1 62700000 1800000 0 0 20140101140007 10 "2014010114000710" 605 6.45e+07 6.63e+07 . 1 0 64500000 1800000 0 0 end format %tcHH:MM start format %tcHH:MM stop
Listed 30 out of 1806113 observations
Finally, I am able to reshape to long using the binary variables above, and I suspect I could try a brute-force calculation in that form based on start/stop time (yielding only one additional variable for each interval, of course). That's what I will try next, but thought I'd send my query out into stata-world to see if there's something I'm missing here.
Thoughts on the best way to do this?
Leanne
Comment