I am unsure of the best time variable to use in a time series declaration (tsset) for panel data of personal air pollution exposure monitoring with 64 monitoring sessions that should have been 48 hours in duration, and 21 of the sessions ended early when the pollution monitor ran out of batteries. I would still like to reflect the 48 hour period in my tsset for each panel because I will examine ambient air pollution during the time that the personal monitoring was missing and have added faux endpoint observations for the 21 sessions that ended early. The data has been collapsed by 1 minute and some monitoring sessions including data reported at 30 second frequency while other monitoring sessions reported a pollution level at 2 minute frequency. Thank you in advance for your thoughts on this!
I tried creating a unique value for minutes time ("minute") but it does not reflect the gap in time.
Also created variable obs_tot by ID_Sess to capture the number of observations in each monitoring session.
Adding a chunk of my data and note the gap in time between the second to last and the last observation listed
I tried creating a unique value for minutes time ("minute") but it does not reflect the gap in time.
Code:
gen minute=_n
Code:
bysort ID_Sess (seq): gen s_seq = _n bysort ID_Sess: generate int obs_tot = _N
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(ID_Sess year doy hr min minute s_seq) int obs_tot 1 2019 278 18 3 431 431 441 1 2019 278 18 5 432 432 441 1 2019 278 18 7 433 433 441 1 2019 278 18 9 434 434 441 1 2019 278 18 11 435 435 441 1 2019 278 18 13 436 436 441 1 2019 278 18 15 437 437 441 1 2019 278 18 17 438 438 441 1 2019 278 18 19 439 439 441 1 2019 278 18 21 440 440 441 1 2019 279 17 2 1440 441 441 end
Comment