Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recode date into consecutive days from first date

    Hi there,

    I am trying to generate a new variable (called daynumber) that recodes a date (currently stored in date [string] and datetime5sec [double with display DMYhms]) to an integer, starting with 1 for the earliest unique value of date, and consecutively numbering up until the last unique value of date.

    An example from my dataset looks like this (but in reality there are thousands of lines for each unique value of date):
    Code:
    . list date time VAcategory datetime5sec
    
         +--------------------------------------------------------+
         |       date       time   VAcateg~y         datetime5sec |
         |--------------------------------------------------------|
      1. |  4/04/2018   10:00:50    moderate   04apr2018 10:00:50 |
      2. |  4/04/2018   10:03:05       light   04apr2018 10:03:05 |
      3. |  5/04/2018   07:15:15    moderate   05apr2018 07:15:15 |
      4. |  5/04/2018   07:28:15    moderate   05apr2018 07:28:15 |
      5. |  6/04/2018   07:46:05   sedentary   06apr2018 07:46:05 |
         |--------------------------------------------------------|
      6. |  6/04/2018   07:46:15    moderate   06apr2018 07:46:15 |
      7. |  7/04/2018   07:45:00    moderate   07apr2018 07:45:00 |
      8. |  7/04/2018   07:45:25    moderate   07apr2018 07:45:25 |
      9. |  8/04/2018   07:42:05       light   08apr2018 07:42:05 |
     10. |  8/04/2018   07:44:55    moderate   08apr2018 07:44:55 |
         |--------------------------------------------------------|
     11. |  9/04/2018   05:16:15   sedentary   09apr2018 05:16:15 |
     12. |  9/04/2018   08:01:20    moderate   09apr2018 08:01:20 |
     13. | 10/04/2018   06:02:05       light   10apr2018 06:02:05 |
     14. | 10/04/2018   06:08:35    moderate   10apr2018 06:08:35 |
         +--------------------------------------------------------+
    I can use levelsof date and see that the list of distinct values is stored in r(levels), and I can use distinct date and see that, in this example there are 7 distinct values (days).

    Code:
    
    . levelsof date
    `"10/04/2018"' `"4/04/2018"' `"5/04/2018"' `"6/04/2018"' `"7/04/2018"' `"8/04/2018"' `"9/04/2018"'
    
    . distinct date
    
    -----------------------------
          |     total   distinct
    ------+----------------------
     date |        14          7
    -----------------------------
    In this example, "4/04/2018" is the first day, so daynumber == 1, "5/04/2018 is the second day, so daynumber == 2, "6/04/2018" is the third day, so daynumber == 3 and so on.

    I am looking for some resources to assist me in writing code that will generate the following:
    Code:
    
         +-------------------------------------------------------------------+
         |       date       time   VAcateg~y         datetime5sec   daynum~r |
         |-------------------------------------------------------------------|
      1. |  4/04/2018   10:00:50    moderate   04apr2018 10:00:50          1 |
      2. |  4/04/2018   10:03:05       light   04apr2018 10:03:05          1 |
      3. |  5/04/2018   07:15:15    moderate   05apr2018 07:15:15          2 |
      4. |  5/04/2018   07:28:15    moderate   05apr2018 07:28:15          2 |
      5. |  6/04/2018   07:46:05   sedentary   06apr2018 07:46:05          3 |
         |-------------------------------------------------------------------|
      6. |  6/04/2018   07:46:15    moderate   06apr2018 07:46:15          3 |
      7. |  7/04/2018   07:45:00    moderate   07apr2018 07:45:00          4 |
      8. |  7/04/2018   07:45:25    moderate   07apr2018 07:45:25          4 |
      9. |  8/04/2018   07:42:05       light   08apr2018 07:42:05          5 |
     10. |  8/04/2018   07:44:55    moderate   08apr2018 07:44:55          5 |
         |-------------------------------------------------------------------|
     11. |  9/04/2018   05:16:15   sedentary   09apr2018 05:16:15          6 |
     12. |  9/04/2018   08:01:20    moderate   09apr2018 08:01:20          6 |
     13. | 10/04/2018   06:02:05       light   10apr2018 06:02:05          7 |
     14. | 10/04/2018   06:08:35    moderate   10apr2018 06:08:35          7 |
         +-------------------------------------------------------------------+
    Note: for each participant, daynumber == 1 is a different day (i.e. not 4/04/2018). Furthermore, occasionally, the dates are not consecutive (i.e. one day is missing, e.g. day 1 is 5/04/2018 and day 2 is 7/04/2018).

    Thank you for any assistance you can provide.
    Sarah

  • #2
    Code:
    egen wanted= group(date)

    Ensure that "date" is a valid Stata date variable to get the correct ordering.

    Comment


    • #3
      Andrew Musau's excellent advice understates the hazard to you here. The results of levelsof show that Stata is treating date as a string variable, because it is one, and so regards 10/4/2018 as coming before 4/4/2018 because that is the way that strings sort,

      So for this purpose and just about any other Stata purpose you need a conversion

      Code:
      gen ddate = daily(date, "DMY") 
      format ddate %td
      and then you can push those daily dates through egen, group()

      Alternatively, your date-time looks well set up and if so dofc(datetime5sec) will pull the daily date out of it. But if it's another string that won't work.

      These difficulties are why showing results of a
      list can help but using dataex such as we ask in FAQ Advice #12 helps even more.

      Comment


      • #4
        Thank you for your advice Andrew Musau and Nick Cox . Implementing both of your code solved the problem:

        Code:
        . generate ddate = dofC(datetime5sec)
        
        . format ddate %td
        
        . egen daynumber = group(ddate)
        
        . list ddate daynumber
        Resulting in the following:

        Code:
        . list ddate daynumber
        
             +----------------------+
             |     ddate   daynum~r |
             |----------------------|
          1. | 04apr2018          1 |
          2. | 04apr2018          1 |
          3. | 05apr2018          2 |
          4. | 05apr2018          2 |
          5. | 06apr2018          3 |
             |----------------------|
          6. | 06apr2018          3 |
          7. | 07apr2018          4 |
          8. | 07apr2018          4 |
          9. | 08apr2018          5 |
         10. | 08apr2018          5 |
             |----------------------|
         11. | 09apr2018          6 |
         12. | 09apr2018          6 |
         13. | 10apr2018          7 |
         14. | 10apr2018          7 |
             +----------------------+
        Thanks again for your time.
        Sarah

        Comment

        Working...
        X