Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Placing dates/time values into same row: code assistant

    Dear Stata users,

    For the example dataset below, I would like to move the examdatetime values for a specific patient when the event_type is 2, so that they appear in the same row as when their event_type is 1.


    Code:
    * Example generated by -dataex-. For more info, type    help    dataex
    clear
    input str15 p_ID float event_type str22 examdatetime
    "100175" 1 ""                    
    "100175" 2 "30/03/2020 6:17:00 PM"
    "100175" 2 "30/03/2020 6:17:00 PM"
    "100175" 2 "30/03/2020 6:17:00 PM"
    "100175" 2 "30/03/2020 6:48:00 PM"
    "100175" 3 ""                    
    end


    I would want my dataset to look like below

    p_ID event_type examdatetime examdatetime1 examdatetime2 examdatetime3 examdatetime4
    map_code_100175 1 1 31/03/2020 18:17 30/03/2020 18:17 30/03/2020 18:17 30/03/2020 18:48
    map_code_100175 2 2 31/03/2020 18:17
    map_code_100175 2 2 30/03/2020 18:17
    map_code_100175 2 2 30/03/2020 18:17
    map_code_100175 2 2 30/03/2020 18:48
    map_code_100175 3 3

    My dataset consists of 500,000 observations and 13 variables, so reshaping the dataset might not be feasible?, but I'm unsure of the best approach.

    Any advice would be greatly appreciated!

    Regards,
    Fouziah
    Last edited by Fouziah Almouqati; 05 Sep 2024, 07:19.

  • #2
    My advice is not to do this unless you can explain why you need it.

    The reason is that with data of this kind almost everything you might want to do in Stata is easier in this so-called long layout than in the wide layout you seek.

    Recently elsewhere this was posted

    I love when people ask how to do something on Statalist and people in the replies just tell them to do something else
    https://x.com/andrew_avit/status/1827123355046359219

    This is, or should be, a feature, not a bug. My family and friends would be the first to tell you about areas of extreme incompetence where I too need and deserve advice to do something different. On points statistical and Stata-istical, a comment not to do that from experienced people is intended in the poster's best interest. And if they're wrong, that too can and should be spelled out.

    Comment


    • #3
      Thank you, Nick, for your advice and prompt response. I may not have been clear earlier (apologies). I'm actually trying to avoid reshaping the entire dataset from long to wide. Instead, my goal is to move the date/time values so that they appear in the same row as event type 1.

      This approach is necessary because I need to calculate the time difference between each exam date and a specific presentation time (not shown in the example).

      Comment


      • #4
        You are saying that you want a different layout but want to avoid a reshape -- in the sense of a reshape command. Noted, but what you're asking for is a hybrid long-wide layout.

        Time differences are easy enough to calculate in long layout. Please use dataex to give us a real(istic) example, including observations that illustrate what you want.

        Comment


        • #5
          Thanks Nick. Please see the example below.

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str15 p_ID float(event_type event_date_time)
          "map_code_100175" 1  1.901202e+12
          "map_code_100175" 2 1.9012114e+12
          "map_code_100175" 2 1.9012114e+12
          "map_code_100175" 2 1.9012114e+12
          "map_code_100175" 2 1.9012133e+12
          "map_code_100175" 3  1.901222e+12
          "map_code_100219" 1 1.8929998e+12
          "map_code_100219" 2  1.893002e+12
          "map_code_100219" 2  1.893002e+12
          "map_code_100219" 2  1.893002e+12
          "map_code_100219" 3  1.893008e+12
          "map_code_100353" 1 1.8594705e+12
          "map_code_100353" 2  1.859475e+12
          "map_code_100353" 2  1.859475e+12
          "map_code_100353" 2  1.859475e+12
          end
          format %tc event_date_time
          I need to calculate the time difference for each record with event type 2 from the time of the previous event type 1. However, this method places the time difference in the cell next to event type 2. Since my final dataset will only include event type 1 records, I want to align the date/time values with event type 1 in the same row. I am considering creating an ID for each block and saving each event type 2 record individually, then merging them into the master file using that ID, but this approach seems cumbersome.





          Comment


          • #6
            Thanks for the example. It shows a problem as you have created your date-time variable as a float. But it should be a double. See

            Code:
            help date_time
            for repeated advice to use a double. Fix that, and then consider this technique, We just record when events of type 1 occurred and copy downwards.

            See e.g.

            http://www.stata.com/support/faqs/da...missing-values

            https://journals.sagepub.com/doi/pdf...6867X231196519

            and for your specific problem

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str15 p_ID float(event_type event_date_time)
            "map_code_100175" 1  1.901202e+12
            "map_code_100175" 2 1.9012114e+12
            "map_code_100175" 2 1.9012114e+12
            "map_code_100175" 2 1.9012114e+12
            "map_code_100175" 2 1.9012133e+12
            "map_code_100175" 3  1.901222e+12
            "map_code_100219" 1 1.8929998e+12
            "map_code_100219" 2  1.893002e+12
            "map_code_100219" 2  1.893002e+12
            "map_code_100219" 2  1.893002e+12
            "map_code_100219" 3  1.893008e+12
            "map_code_100353" 1 1.8594705e+12
            "map_code_100353" 2  1.859475e+12
            "map_code_100353" 2  1.859475e+12
            "map_code_100353" 2  1.859475e+12
            end
            format %tc event_date_time
            
            bysort p_ID (event_date_time) : gen double previous1 = event_date_time if event_type == 1
            by p_ID : replace previous1 = previous1[_n-1] if missing(previous1)
            
                 +------------------------------------------------------------------------+
                 |            p_ID   event~pe      event_date_time   previous1     wanted |
                 |------------------------------------------------------------------------|
              1. | map_code_100175          1   30mar2020 15:39:41   1.901e+12          . |
              2. | map_code_100175          2   30mar2020 18:16:58   1.901e+12    9437184 |
              3. | map_code_100175          2   30mar2020 18:16:58   1.901e+12    9437184 |
              4. | map_code_100175          2   30mar2020 18:16:58   1.901e+12    9437184 |
              5. | map_code_100175          2   30mar2020 18:47:33   1.901e+12   11272192 |
              6. | map_code_100175          3   30mar2020 21:13:55   1.901e+12          . |
                 |------------------------------------------------------------------------|
              7. | map_code_100219          1   26dec2019 17:15:57   1.893e+12          . |
              8. | map_code_100219          2   26dec2019 17:53:06   1.893e+12    2228224 |
              9. | map_code_100219          2   26dec2019 17:53:06   1.893e+12    2228224 |
             10. | map_code_100219          2   26dec2019 17:53:06   1.893e+12    2228224 |
             11. | map_code_100219          3   26dec2019 19:33:35   1.893e+12          . |
                 |------------------------------------------------------------------------|
             12. | map_code_100353          1   03dec2018 15:34:51   1.859e+12          . |
             13. | map_code_100353          2   03dec2018 16:49:08   1.859e+12    4456448 |
             14. | map_code_100353          2   03dec2018 16:49:08   1.859e+12    4456448 |
             15. | map_code_100353          2   03dec2018 16:49:08   1.859e+12    4456448 |
                 +------------------------------------------------------------------------+
            
            gen double wanted = event_date_time - previous1 if event_type ==2 
            
            list, sepby(p_ID) 
            
            
                +------------------------------------------------------------------------+
                 |            p_ID   event~pe      event_date_time   previous1     wanted |
                 |------------------------------------------------------------------------|
              1. | map_code_100175          1   30mar2020 15:39:41   1.901e+12          . |
              2. | map_code_100175          2   30mar2020 18:16:58   1.901e+12    9437184 |
              3. | map_code_100175          2   30mar2020 18:16:58   1.901e+12    9437184 |
              4. | map_code_100175          2   30mar2020 18:16:58   1.901e+12    9437184 |
              5. | map_code_100175          2   30mar2020 18:47:33   1.901e+12   11272192 |
              6. | map_code_100175          3   30mar2020 21:13:55   1.901e+12          . |
                 |------------------------------------------------------------------------|
              7. | map_code_100219          1   26dec2019 17:15:57   1.893e+12          . |
              8. | map_code_100219          2   26dec2019 17:53:06   1.893e+12    2228224 |
              9. | map_code_100219          2   26dec2019 17:53:06   1.893e+12    2228224 |
             10. | map_code_100219          2   26dec2019 17:53:06   1.893e+12    2228224 |
             11. | map_code_100219          3   26dec2019 19:33:35   1.893e+12          . |
                 |------------------------------------------------------------------------|
             12. | map_code_100353          1   03dec2018 15:34:51   1.859e+12          . |
             13. | map_code_100353          2   03dec2018 16:49:08   1.859e+12    4456448 |
             14. | map_code_100353          2   03dec2018 16:49:08   1.859e+12    4456448 |
             15. | map_code_100353          2   03dec2018 16:49:08   1.859e+12    4456448 |
                 +------------------------------------------------------------------------+
            If your original date-times are reported to the nearest second, then in the units of milliseconds used by Stata for date-times, all date-times and all differences between date-times should be multiples of 1000.

            You should now be better placed.

            Comment


            • #7
              Thanks you Nick. I also read the reference you provided, and I was able to move the values to another cell. I used
              Code:
              replace exam_d_t_1 = exam_d_t_1[_n+1] if missing(exam_d_t_1)
              It gave me exactly what I needed.

              Comment

              X