Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unevenly spaced time series: Help with Descriptive Statistics

    I am analysing an unevenly spaced times series, and am hoping to generate descriptive statistics on the cumulative time spent below a specific threshold value during that time series.

    I imagine any function would need to model the "gaps" between times using a linear algebraic equation. Is there an easy way of doing this in Stata?

    My real-world application is a series of randomly measured systolic blood pressures on the same individual over time.
    My question is: During our observation period, over what period of collective/cumulative time did the individual have a blood pressure less than 100?

    This individual may have had several dips in blood pressure below 100, before returning above 100 during a series of measures.
    Being above 100, does not "make up" for times below 100 (ie its not an averaging problem).

    Disclosure: I'm a clinician by background, not a statistician. No doubt there will be a command that does this. I've searched the archives under a number of terms to no avail. Thanks for your assistance.






  • #2
    Data example please https://www.statalist.org/forums/help#stata

    More than one individual?

    Irregularly spaced daily dates? Time of day relevant or not? Or some other time framework?

    Comment


    • #3
      Thanks for getting back to me so quickly.
      My data is imported from REDCap.

      Here is my DATAEX output:

      [CODE]
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(record_id redcap_repeat_instance) str16 vitals_time int sbp

      2 1 "2022-01-12 05:18" 161
      2 2 "2022-01-12 05:20" 158
      2 3 "2022-01-12 05:23" 109
      2 4 "2022-01-12 05:25" 133
      2 5 "2022-01-12 05:28" 122
      2 6 "2022-01-12 05:57" 135
      2 7 "2022-01-12 06:00" 129
      2 8 "2022-01-12 06:05" 127
      2 9 "2022-01-12 06:11" 131
      2 10 "2022-01-12 06:26" 116
      2 11 "2022-01-12 07:00" 113
      2 12 "2022-01-12 07:13" 113
      3 1 "2022-01-10 21:51" 149
      3 2 "2022-01-10 22:06" 113
      3 3 "2022-01-10 22:11" 136
      3 4 "2022-01-10 23:32" 99
      3 5 "2022-01-10 23:56" 95
      3 6 "2022-01-11 00:10" 102
      3 7 "2022-01-11 00:47" 97

      Here there is 3 patients.
      Patient 1 has no measurements taken.
      Patient 2 has 12 measurements taken.
      Patient 3 has 7 measurements taken.
      The final column is the blood pressure.

      This is how my data is currently formatted.
      I have a total of 30 patients.
      If needed I could just import data for an individual patient at a time and work on that.

      For this dataset...
      Patient 2: never falls below 100.
      Patient 3: Falls twice below 100.
      The first dip occurs "somewhere" between repeat measurements 3 and 4, and corrects somewhere between repeat measurements 5 and 6.
      The second dip occurs "somewhere" between repeat measurement 6 and persists until the end of observation at repeat measurement 7.
      I think the solution might have something to do with ipolate?? Just cant figure it out.

      My datetimes have been formated post importing using this:
      tostring vitals_time, replace
      gen double _temp_ = Clock(vitals_time,"YMDhm")
      drop vitals_time
      rename _temp_ vitals_time
      format vitals_time %tCMonth_dd,_CCYY_HH:MM

      Thanks.
      Last edited by Martin Dutch; 15 Mar 2023, 05:29.

      Comment


      • #4
        I wouldn't buy what you want to do. You rely on a monotonicity assumption which in my opinion does not apply in the movements in blood pressure. The following graph from your data illustrates my point:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(id measurement) double time int bp
        2  1 1957583880000 161
        2  2  1.957584e+12 158
        2  3 1957584180000 109
        2  4 1.9575843e+12 133
        2  5 1957584480000 122
        2  6 1957586220000 135
        2  7 1.9575864e+12 129
        2  8 1.9575867e+12 127
        2  9 1957587060000 131
        2 10 1957587960000 116
        2 11   1.95759e+12 113
        2 12 1957590780000 113
        3  1 1957470660000 149
        3  2 1957471560000 113
        3  3 1957471860000 136
        3  4 1957476720000  99
        3  5 1957478160000  95
        3  6  1.957479e+12 102
        3  7 1957481220000  97
        end
        format %tc time
        
        set scheme s1mono
        tw (line bp time if id==2, sort) (scatter bp time if id==2, mc(red)), leg(off) xlab(, labsize(vsmall)) xtitle("")
        Res.:
        Click image for larger version

Name:	Graph.png
Views:	1
Size:	38.8 KB
ID:	1705777




        Now, let us take away the second, third, fourth and fifth data points. Your model would say that stystolic blood pressure never went below 135. Now these points are clustered, but you want to predict points further away in the right-hand side where data points are sparse. That's my two cents.
        Last edited by Andrew Musau; 15 Mar 2023, 07:36.

        Comment


        • #5
          I note the recording of time to the nearest minute, so I would interpolate on a grid of every minute.


          Code:
          bysort record_id (redcap_repeat) : gen long mytime = (clock(vitals_time, "YMD hm") - clock(vitals_time[1], "YMD hm")) / 60000 
          tsset record_id mytime 
          tsfill 
          sort record_id mytime
          ipolate sbp mytime , gen(sbp2) by(record_id)
          egen wanted = total(sbp2 < 100), by(record_id)

          Comment


          • #6
            Thanks Nick, that was extremely helpful.
            I note that your 'wanted' column now calculates the number of times the systolic blood pressure is less than 100.

            I wondered if you might suggest an elegant way to expand out the mytime variable in 1 minute increments for each record_id.
            If I could do this, is could repeat the ipolate statement, and in doing so count the minutes below 100.

            Comment


            • #7
              I think what you're asking for in #6 is already provided in #5. tsfill and ipolate as given here automatically work on each identifier separately.

              Here is your recent code, fixed slightly, my code repeated and an extra graph command that shows you what was done.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte(record_id redcap_repeat_instance) str16 vitals_time int sbp
              2 1 "2022-01-12 05:18" 161
              2 2 "2022-01-12 05:20" 158
              2 3 "2022-01-12 05:23" 109
              2 4 "2022-01-12 05:25" 133
              2 5 "2022-01-12 05:28" 122
              2 6 "2022-01-12 05:57" 135
              2 7 "2022-01-12 06:00" 129
              2 8 "2022-01-12 06:05" 127
              2 9 "2022-01-12 06:11" 131
              2 10 "2022-01-12 06:26" 116
              2 11 "2022-01-12 07:00" 113
              2 12 "2022-01-12 07:13" 113
              3 1 "2022-01-10 21:51" 149
              3 2 "2022-01-10 22:06" 113
              3 3 "2022-01-10 22:11" 136
              3 4 "2022-01-10 23:32" 99
              3 5 "2022-01-10 23:56" 95
              3 6 "2022-01-11 00:10" 102
              3 7 "2022-01-11 00:47" 97
              end 
              
              bysort record_id (redcap_repeat) : gen long mytime = (clock(vitals_time, "YMD hm") - clock(vitals_time[1], "YMD hm")) / 60000 
              tsset record_id mytime 
              tsfill 
              sort record_id mytime
              ipolate sbp mytime , gen(sbp2) by(record_id)
              egen wanted = total(sbp2 < 100), by(record_id)
              
              twoway line sbp2 mytime, by(record_id)  || scatter sbp mytime, ms(Oh) msize(large)

              Comment


              • #8
                Brilliant. My apologies. When executing your code in my extended dataset, I hadn't realised it had generated an error (due to a repeated observation, causes a repeated time measurement and an error with tsfill). After cleaning my data (duplicates reporting functions)... it works brilliantly. Thank you.

                Comment

                Working...
                X