Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating days elapsed between dates in Panel data

    Hello guys,

    I am trying to calculate days elapsed for each 'episode' between x and y values. For example, the days elapsed in Episode 1 equals days between 12th March 2020 and 21st April 2020, Episode 2 is between 1st June 2020 and 11th January 2021 and so on. I was wondering if there is a way to do this in STATA in the panel format.

    Id date x y Episode
    1 12th March 2020 1 1
    1 18th March 2020 1 1
    1 21st April 2020 1 1
    1 1ST June 2020 1 2
    1 11 January 2021 1 2
    1 06th June 2021 1 3
    1 18th December 2021 1 3
    1 02nd March 2022 1 3
    1 15th April 2022 1 3
    1 21st April 2022 1 4
    1 24th April 2022 1 4
    Last edited by Vikas Arya; 22 Aug 2022, 20:25.

  • #2
    Your date variable is about as ugly as they can get for Stata. You need to make a Stata internal format numerical date variable out of it, and all those th's and nd's and st's get in the way. Not to mention the unfortunate blend of upper and lower case. So it needs a fair amount of cleaning first. Once that's done, it's pretty straightforward to calculate episode durations.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str19 date byte(x y episode)
    1 "12th March 2020 "    1 . 1
    1 "18th March 2020 "    . 1 1
    1 "21st April 2020 "    . 1 1
    1 "1ST June 2020 "      1 . 2
    1 "11 January 2021 "    . 1 2
    1 "06th June 2021 "     1 . 3
    1 "18th December 2021 " . 1 3
    1 "02nd March 2022 "    . 1 3
    1 "15th April 2022 "    . 1 3
    1 "21st April 2022 "    1 . 4
    1 "24th April 2022 "    . 1 4
    end
    
    replace date = lower(date)
    replace date = subinstr(date, "th", "", .)
    replace date = subinstr(date, "st", "", .)
    replace date = subinstr(date, "nd", "", .)
    gen _date = daily(date, "DMY"), after(date)
    assert missing(_date) == missing(date)
    format _date %td
    drop date
    rename _date date
    
    by id episode (date), sort: gen episode_duration = date[_N] - date[1]
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Your date variable is about as ugly as they can get for Stata.
      I respectfully disagree. I've seen so much worse!

      In particular, the # wildcard in the mask is a lifesaver. And the case doesn't perturb Stata at all in this matter.

      Consider just this for processing dates:

      Code:
      gen _date = daily(_date,"D#MY"), after(date)
      replace _date = daily(date,"DMY") if missing(_date)
      format %td _date
      
      assert missing(_date) == missing(date)
      drop date
      rename _date date
      Last edited by Hemanshu Kumar; 23 Aug 2022, 00:17.

      Comment


      • #4
        Thank you very much Clyde, I really appreciate this (and sorry about the dates!).

        Comment

        Working...
        X