Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating durations by ID

    Dear Stata users,

    I have data in long format with dates of specific events for each unique identifier. The goal is to try and calculate duration between some events. Below is an example of data


    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(ID step) int date
    101 100 20577
    101 200 20682
    101 300 20759
    101 400 20767
    101 700 21947
    102 200 20361
    102 300 20445
    102 400 20577
    102 500 20682
    102 700 20759
    103 100 20767
    103 200 21945
    103 300 21054
    103 400 21059
    103 500 21101
    103 600 21131
    103 700 21145
    104 100 21166
    104 200 21549
    104 300 21153
    104 400 21157
    104 500 21159
    104 600 21161
    104 700 21165
    105 100 21165
    105 300 21585
    105 400 21527
    105 500 21528
    105 600 21529
    105 700 21530
    end
    format %tdnn/dd/CCYY date
    [/CODE]

    ID is the unique identifier, Step is the event and date is the date when the event occurred.

    I would like for example to calculate the duration between step 100 - 500, 200 - 700.

    Notably, not all unique IDs have all steps. There's a possibility of some IDs not having some steps.

    Thanks in advance!

  • #2
    Here is a first suggestion:

    Code:
    *Computing the days to the follwing date*
    bysort ID: gen diff1 = datediff(date, date[_n+1], "day")
    label var diff1 "Days to the following event"
    
    *Computing the time between t100 and t500
    gen t1 = date if step == 100
    gen t2 = date if step == 500
    
    bysort ID: egen t1x = max(t1)
    bysort ID: egen t2x = max(t2)
    gen diff_100_500 = datediff(t1x, t2x, "day")
    I kept the auxiliary variables so you can check what is going on.
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      You have panel data, so you can use time-series operators:

      I would like for example to calculate the duration between step 100 - 500, 200 - 700.
      Code:
      xtset ID step, delta(100)
      gen wanted1=(date-L4.date) if step==500
      gen wanted2=(date-L5.date) if step==700

      Comment


      • #4
        Felix Bittmann Thanks so much for the approach. I sincerely appreciate

        Comment


        • #5
          Andrew Musau Thanks a bunch!! The trick about the data being considered as panel is super helpful

          Comment

          Working...
          X