Unevenly spaced time series: Help with Descriptive Statistics

Martin Dutch

Join Date: Mar 2023

Posts: 4
#1

Unevenly spaced time series: Help with Descriptive Statistics

15 Mar 2023, 00:00

I am analysing an unevenly spaced times series, and am hoping to generate descriptive statistics on the cumulative time spent below a specific threshold value during that time series.

I imagine any function would need to model the "gaps" between times using a linear algebraic equation. Is there an easy way of doing this in Stata?

My real-world application is a series of randomly measured systolic blood pressures on the same individual over time.
My question is: During our observation period, over what period of collective/cumulative time did the individual have a blood pressure less than 100?

This individual may have had several dips in blood pressure below 100, before returning above 100 during a series of measures.
Being above 100, does not "make up" for times below 100 (ie its not an averaging problem).

Disclosure: I'm a clinician by background, not a statistician. No doubt there will be a command that does this. I've searched the archives under a number of terms to no avail. Thanks for your assistance.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35438
#2

15 Mar 2023, 04:03

Data example please https://www.statalist.org/forums/help#stata

More than one individual?

Irregularly spaced daily dates? Time of day relevant or not? Or some other time framework?
Comment
Martin Dutch

Join Date: Mar 2023

Posts: 4
#3

15 Mar 2023, 05:12

Thanks for getting back to me so quickly.
My data is imported from REDCap.

Here is my DATAEX output:

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(record_id redcap_repeat_instance) str16 vitals_time int sbp

2 1 "2022-01-12 05:18" 161
2 2 "2022-01-12 05:20" 158
2 3 "2022-01-12 05:23" 109
2 4 "2022-01-12 05:25" 133
2 5 "2022-01-12 05:28" 122
2 6 "2022-01-12 05:57" 135
2 7 "2022-01-12 06:00" 129
2 8 "2022-01-12 06:05" 127
2 9 "2022-01-12 06:11" 131
2 10 "2022-01-12 06:26" 116
2 11 "2022-01-12 07:00" 113
2 12 "2022-01-12 07:13" 113
3 1 "2022-01-10 21:51" 149
3 2 "2022-01-10 22:06" 113
3 3 "2022-01-10 22:11" 136
3 4 "2022-01-10 23:32" 99
3 5 "2022-01-10 23:56" 95
3 6 "2022-01-11 00:10" 102
3 7 "2022-01-11 00:47" 97

Here there is 3 patients.
Patient 1 has no measurements taken.
Patient 2 has 12 measurements taken.
Patient 3 has 7 measurements taken.
The final column is the blood pressure.

This is how my data is currently formatted.
I have a total of 30 patients.
If needed I could just import data for an individual patient at a time and work on that.

For this dataset...
Patient 2: never falls below 100.
Patient 3: Falls twice below 100.
The first dip occurs "somewhere" between repeat measurements 3 and 4, and corrects somewhere between repeat measurements 5 and 6.
The second dip occurs "somewhere" between repeat measurement 6 and persists until the end of observation at repeat measurement 7.
I think the solution might have something to do with ipolate?? Just cant figure it out.

My datetimes have been formated post importing using this:
tostring vitals_time, replace
gen double _temp_ = Clock(vitals_time,"YMDhm")
drop vitals_time
rename _temp_ vitals_time
format vitals_time %tCMonth_dd,_CCYY_HH:MM

Thanks.

Last edited by Martin Dutch; 15 Mar 2023, 05:29.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10089

15 Mar 2023, 07:34

I wouldn't buy what you want to do. You rely on a monotonicity assumption which in my opinion does not apply in the movements in blood pressure. The following graph from your data illustrates my point:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id measurement) double time int bp
2  1 1957583880000 161
2  2  1.957584e+12 158
2  3 1957584180000 109
2  4 1.9575843e+12 133
2  5 1957584480000 122
2  6 1957586220000 135
2  7 1.9575864e+12 129
2  8 1.9575867e+12 127
2  9 1957587060000 131
2 10 1957587960000 116
2 11   1.95759e+12 113
2 12 1957590780000 113
3  1 1957470660000 149
3  2 1957471560000 113
3  3 1957471860000 136
3  4 1957476720000  99
3  5 1957478160000  95
3  6  1.957479e+12 102
3  7 1957481220000  97
end
format %tc time

set scheme s1mono
tw (line bp time if id==2, sort) (scatter bp time if id==2, mc(red)), leg(off) xlab(, labsize(vsmall)) xtitle("")

Res.:

Click image for larger version

Name: Graph.png
Views: 1
Size: 38.8 KB
ID: 1705777

Now, let us take away the second, third, fourth and fifth data points. Your model would say that stystolic blood pressure never went below 135. Now these points are clustered, but you want to predict points further away in the right-hand side where data points are sparse. That's my two cents.

Last edited by Andrew Musau; 15 Mar 2023, 07:36.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35438

15 Mar 2023, 07:35

I note the recording of time to the nearest minute, so I would interpolate on a grid of every minute.

Code:

bysort record_id (redcap_repeat) : gen long mytime = (clock(vitals_time, "YMD hm") - clock(vitals_time[1], "YMD hm")) / 60000 
tsset record_id mytime 
tsfill 
sort record_id mytime
ipolate sbp mytime , gen(sbp2) by(record_id)
egen wanted = total(sbp2 < 100), by(record_id)

Comment

Martin Dutch

Join Date: Mar 2023

Posts: 4
#6

15 Mar 2023, 19:17

Thanks Nick, that was extremely helpful.
I note that your 'wanted' column now calculates the number of times the systolic blood pressure is less than 100.

I wondered if you might suggest an elegant way to expand out the mytime variable in 1 minute increments for each record_id.
If I could do this, is could repeat the ipolate statement, and in doing so count the minutes below 100.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35438

15 Mar 2023, 20:08

I think what you're asking for in #6 is already provided in #5. tsfill and ipolate as given here automatically work on each identifier separately.

Here is your recent code, fixed slightly, my code repeated and an extra graph command that shows you what was done.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(record_id redcap_repeat_instance) str16 vitals_time int sbp
2 1 "2022-01-12 05:18" 161
2 2 "2022-01-12 05:20" 158
2 3 "2022-01-12 05:23" 109
2 4 "2022-01-12 05:25" 133
2 5 "2022-01-12 05:28" 122
2 6 "2022-01-12 05:57" 135
2 7 "2022-01-12 06:00" 129
2 8 "2022-01-12 06:05" 127
2 9 "2022-01-12 06:11" 131
2 10 "2022-01-12 06:26" 116
2 11 "2022-01-12 07:00" 113
2 12 "2022-01-12 07:13" 113
3 1 "2022-01-10 21:51" 149
3 2 "2022-01-10 22:06" 113
3 3 "2022-01-10 22:11" 136
3 4 "2022-01-10 23:32" 99
3 5 "2022-01-10 23:56" 95
3 6 "2022-01-11 00:10" 102
3 7 "2022-01-11 00:47" 97
end 

bysort record_id (redcap_repeat) : gen long mytime = (clock(vitals_time, "YMD hm") - clock(vitals_time[1], "YMD hm")) / 60000 
tsset record_id mytime 
tsfill 
sort record_id mytime
ipolate sbp mytime , gen(sbp2) by(record_id)
egen wanted = total(sbp2 < 100), by(record_id)

twoway line sbp2 mytime, by(record_id)  || scatter sbp mytime, ms(Oh) msize(large)

Comment

Martin Dutch

Join Date: Mar 2023

Posts: 4
#8

15 Mar 2023, 23:58

Brilliant. My apologies. When executing your code in my extended dataset, I hadn't realised it had generated an error (due to a repeated observation, causes a repeated time measurement and an error with tsfill). After cleaning my data (duplicates reporting functions)... it works brilliantly. Thank you.
1 like
Comment

Announcement