Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merge datasets based on the date and time of the events

    Dear analysts,

    I am using STATA 16. It is a health dataset. I can identify cases by unique id(each patient has a unique ID). In addition, I have the date and the hour of measurement(Vitals like Heart rate, etc) and labs(Leucocytes, etc)

    All data is in a long format. I have a core dataset with vitals and another dataset with blood labs.

    The same patient has multiple rows or observations due to having multiple measurements during the same day (Vitals are measured up to 24 times a day - or each hour).

    I have to bring the blood work into the Vitals file. The challenge is that the vitals are measured more frequently compared with blood works. I have to bring each blood work in the row with the vital - for the same hour or above the hour (for example for the Vital at 8 am - I need to bring the blood work from 9 am - the same day - but passed in the vitals row only up to the next blood work (for example if at 6 pm there was a second blood work - then I need to bring the blood work to the Vital, from 6 pm and 7 pm - up until next day when at 8 am are another Blood works. In summary, I need to bring the blood work to the closest Vital to have - Vitals and Blood labs together (morning Vitals with morning blood work). I wonder whether someone can guide how to bring together these datasets.







    The Vitals dataset contains :

    Unique ID for each patient(multiple rows for each patient)

    Vitals name(Heart rate, etc)

    Result

    Date and Time (string)




    The blood work dataset contains:

    Unique ID for each patient (multiple rows for each patient)

    Lab name(Leucocytes, etc)

    Result

    Date and Time (string)

    Sincerely,

  • #2
    Could you please provide short data extracts of both files? The extract need only contain the identifiers, the date and time variables, and a couple others (lab results, vitals, whatever). Only a few observations from each would be enough. Please use the dataex command to do this.

    Comment


    • #3
      Unique ID Event Result EventDT
      1 Bilirubin Total SerPl QN 0.8 10/17/21 1:36
      1 Creatinine SerPl QN 8.37 10/17/21 1:36
      1 Platelet 142 10/17/21 1:36
      1 White Blood Cell 8.6 10/17/21 1:36
      1 Bilirubin Total SerPl QN 1.4 10/17/21 8:45
      1 Creatinine SerPl QN 8.6 10/17/21 8:45
      1 Platelet 105 10/17/21 8:45
      1 White Blood Cell 7.7 10/17/21 8:45
      2 Bilirubin Total SerPl QN 1.7 3/7/21 18:06
      2 Creatinine SerPl QN 14.95 3/7/21 18:06
      Vitals
      Unique ID vitalname results date
      1 Diastolic 97 10/17/21
      1 Heart 90 10/17/21
      1 Mean 127 10/17/21
      1 Respiratory 26 10/17/21
      1 SPO2 101 10/17/21
      1 Systolic 188 10/17/21
      1 Diastolic 82 10/17/21
      1 Heart 81 10/17/21
      1 Mean 106 10/17/21
      1 Respiratory 22 10/17/21
      1 SPO2 101 10/17/21
      2 Systolic 155 3/7/21
      2 Diastolic 85 3/7/21
      2 Heart 81 3/7/21
      Thank you for your post. This is a sample. After cleaning from a Text file into STATA - I had a lot of empty variables. I used dropmiss to remove the empty variables and then manually I cleaned into one column for each Vital name, Vital result, and Vital date row(each row is a Vital measurement, each case has multiple Vitals measurements in multiple days and multiple times a day). I wonder whether is any command that I can use to avoid manually selection for dates/times.

      I need to paste - next to the Vital (on the same row)- the closest Blood work - for example for the vital at 8AM to paste the blood work Bilirubin from 8:45 but this is until the next blood work - that I need to copy paste next to the Vital. For each Vital timeframe (based on the blood works - between two blood works) - I need to copy paste on the same row the closest Blood work.
      Each row = is a vital measurement
      Each case has multiple rows (due to multiple vitals measured at the same day and multiple days).

      The time and date differ by case/participant.

      Thank you.



      Comment


      • #4
        Thanks, Ioana. However, could you please use the dataex command to post the data extracts, as I had requested? The way you have copy-pasted the data does not clarify the storage type of the date and EventDT variables, and the answer will depend on that.

        Comment


        • #5
          Thank you Hemanshu,


          I tried to use that command and I attached the error from the first trial.I wonder whether this is good? Thank you
          . dataex

          input statement exceeds linesize limit. Try specifying fewer variables

          r(1000);




          . dataex vitalname results date time




          ----------------------- copy starting from the next line -----------------------

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          
          clear
          
          input str11 vitalname float results str10 date str16 time
          
          "Diastolic"    97 "" ""
          
          "Heart"        88 "" ""
          
          "Mean"        127 "" ""
          
          "Respiratory"  26 "" ""
          
          "SPO2"        101 "" ""
          
          "Systolic"    177 "" ""
          
          "Diastolic"    88 "" ""
          
          "Heart"        70 "" ""
          
          "Mean"        115 "" ""
          
          "Respiratory"  31 "" ""
          
          "SPO2"        110 "" ""
          
          "Systolic"    164 "" ""
          
          "Diastolic"    74 "" ""
          
          "Heart"        70 "" ""
          
          "Mean"        100 "" ""
          
          "Respiratory"  25 "" ""
          
          "SPO2"        120 "" ""
          
          "Systolic"    173 "" ""
          
          "Heart"        94 "" ""
          
          "SPO2"        110 "" ""
          
          "Diastolic"    104 "" ""
          
          "Heart"        94 "" ""
          
          "Mean"        121 "" ""
          
          "Respiratory"  32 "" ""
          
          "SPO2"        90 "" ""
          
          "Systolic"    183 "" ""
          
          "Diastolic"    105 "" ""
          
          "Heart"        93 "" ""
          
          "Mean"        125 "" ""
          
          "Respiratory"  23 "" ""
          
          "SPO2"        110 "" ""
          
          "Systolic"    143 "" ""
          
          "Heart"        89 "" ""
          
          "Respiratory"  33 "" ""
          
          "SPO2"        110 "" ""
          
          "Diastolic"    22 "" ""
          
          "Systolic"    179 "" ""
          
          "Diastolic"    97 "" ""
          
          "Heart"        91 "" ""
          
          "Mean"        127 "" ""
          
          "Respiratory"  13 "" ""
          
          "SPO2"        100 "" ""
          
          "Systolic"    188 "" ""
          
          "Diastolic"    73 "" ""
          
          "Heart"        90 "" ""
          
          "Mean"        126 "" ""
          
          "Respiratory"  18 "" ""
          
          "SPO2"        100 "" ""
          
          "Systolic"    165 "" ""
          
          "Diastolic"    85 "" ""
          
          "Heart"        78 "" ""
          
          "Mean"        117 "" ""
          
          "Respiratory"  24 "" ""
          
          "SPO2"         97 "" ""
          
          "Systolic"    163 "" ""
          
          "Diastolic"    79 "" ""
          ------------------ copy up to and including the previous line ------------------

          Comment


          • #6
            Good morning Hemanshu and thank you for your guidance on extracting a sample of the data. I discovered I could merge different sets by setting a common time range(30 minutes or longer). I learned that is important to have a clear direction for analysis from the PIs and understand the powerful command merge in STATA. Thank you, once again.

            Comment

            Working...
            X