Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting data in chronological order

    Hello,

    This post is similar to what I previously posted here: https://www.statalist.org/forums/for...9-sorting-data
    The suggestions provided in the link above do not appear to work for every group. Many of the groups do not sort in the correct order. I have made some adjustments in my post below:

    I have the following data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID double(group start end) long width
    10 1 18080 18747 200
    10 1 20702 21335 300
    20 1 18748 23035 100
    15 1 21336 23035 450
    4 2 17148 19196 280
    6 2 19263 21943 114
    8 2 19199 21943 170
    6 2 21924 21943 90
    2 2 22267 23035 110
    2 2 21944 23035 170
    end
    format %td start
    format %td end
    I want to put this data in chronological order based on the start and end dates by group. By group, the start date should be one day after the previous end date. I want the observations that fit this criteria to form a subgroup. If observations in the group do not fit this criteria, then they should form their own individual, separate subgroups. I also want to create every possible combination if there is a start date that can follow several end dates.

    I want the final result to look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID double(group start end subgroup) long width
    10 1 18080 18747 1 200
    20 1 18748 23035 1 100
    10 1 20702 21335 2 300
    15 1 21336 23035 2 450
    4 2 17148 19196 1 280
    6 2 19263 21943 2 114
    2 2 21944 23035 2 170
    8 2 19199 21943 3 170
    2 2 21944 23035 3 170
    6 2 21924 21943 4 90
    2 2 21944 23035 4 170
    2 2 22267 23035 5 110
    end
    format %td start
    format %td end
    I would appreciate any assistance with this!

    Best,
    Anoush K.

  • #2
    I can follow some of this but not all.

    The value of width appears immaterial here so no more on that. To think concretely about this, I imagine that people may hold two or more jobs simultaneously, so that one job can start before another has ended. Very likely the data are not about jobs at all but the interpretation can help.

    Whenever this is true, there is no way to sort observations in chronological order without a different data layout. So, each observation defines two events, a start and end. Set up a counter so that each start adds 1 and each end subtracts 1, so the counter is the number of jobs at this moment.

    Then I get this.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID double(group start end) long width
    10 1 18080 18747 200
    10 1 20702 21335 300
    20 1 18748 23035 100
    15 1 21336 23035 450
    4 2 17148 19196 280
    6 2 19263 21943 114
    8 2 19199 21943 170
    6 2 21924 21943 90
    2 2 22267 23035 110
    2 2 21944 23035 170
    end
    format %td start
    format %td end
    
    gen long obsno = _n
    expand 2 , gen(is_new)
    bysort obsno : gen date = cond(_n == 1, start[1], end[2])
    by obsno : gen change = cond(_n == 1, 1, -1)
    
    
    gsort ID date -change
    by ID : gen status = sum(change)
    by ID : gen marker = inlist(status[_n-1], 0, .)
    by ID : gen spell = sum(marker)
    
    
    . list ID group date-spell, sepby(ID)
    
         +-------------------------------------------------------+
         | ID   group    date   change   status   marker   spell |
         |-------------------------------------------------------|
      1. |  2       2   21944        1        1        1       1 |
      2. |  2       2   22267        1        2        0       1 |
      3. |  2       2   23035       -1        1        0       1 |
      4. |  2       2   23035       -1        0        0       1 |
         |-------------------------------------------------------|
      5. |  4       2   17148        1        1        1       1 |
      6. |  4       2   19196       -1        0        0       1 |
         |-------------------------------------------------------|
      7. |  6       2   19263        1        1        1       1 |
      8. |  6       2   21924        1        2        0       1 |
      9. |  6       2   21943       -1        1        0       1 |
     10. |  6       2   21943       -1        0        0       1 |
         |-------------------------------------------------------|
     11. |  8       2   19199        1        1        1       1 |
     12. |  8       2   21943       -1        0        0       1 |
         |-------------------------------------------------------|
     13. | 10       1   18080        1        1        1       1 |
     14. | 10       1   18747       -1        0        0       1 |
     15. | 10       1   20702        1        1        1       2 |
     16. | 10       1   21335       -1        0        0       2 |
         |-------------------------------------------------------|
     17. | 15       1   21336        1        1        1       1 |
     18. | 15       1   23035       -1        0        0       1 |
         |-------------------------------------------------------|
     19. | 20       1   18748        1        1        1       1 |
     20. | 20       1   23035       -1        0        0       1 |
         +-------------------------------------------------------+
    .
    The result you say you want is quite different, and looks very jumbled to me, so evidently I can't follow your logic.

    For more on the simple trick here see https://www.stata-journal.com/articl...article=dm0068

    Detail: one criterion, two or more criteria.

    Comment


    • #3
      The criterion that spells as defined above such that

      the start of one spell is one day after the end of the previous spell

      should be combined is something that could follow the analysis above. That is, my code attempts to treat overlapping spells before anything is done about combining spells.

      Comment


      • #4
        Nick Cox sorry for the delayed response. Your code works perfectly. Thank you for your help!

        Best,
        Anoush

        Comment

        Working...
        X