Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting data

    Hello,

    I have the following data below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(ID start end)
    2 18080 18747
    2 20702 21335
    2 18748 23011
    2 21336 23011
    end
    format %td start
    format %td end
    Within ID’s, I want to properly order the data so that the previous end date is one day less than the next start date. The data should look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(ID start end)
    2 18080 18747
    2 18748 23011
    2 20702 21335
    2 21336 23011
    end
    format %td start
    format %td end
    If this cannot be achieved within ID, then I would like to create separate groups for those cases.

    I would appreciate any assistance.

    Thanks,
    Anoush K.
    Last edited by Anoush Khachatryan; 23 Jan 2023, 14:42.

  • #2
    Code:
    sort ID start
    ?

    Comment


    • #3
      Andrew Musau not quite. That will not work in every instance. Take this data, for example:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(ID start end)
      1 15918 18716
      1 18717 20543
      1 18720 20543
      1 20544 23011
      1 22902 23011
      end
      format %td start
      format %td end

      Comment


      • #4
        And what do you want the result to be after sorting the data shown in post #3?

        Comment


        • #5
          William Lisowski I want the start value of _n to be one day greater than the previous end value _n-1. If that cannot happen, then I want to create separate groups for those observations.

          Comment


          • #6
            I do not understand what you mean by "create separate groups".

            Comment


            • #7
              William Lisowski separate groups like this:

              Before:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double(ID start end)
              2 18080 18747
              2 21329 23014 
              2 20702 21335
              2 18748 23011
              2 21336 23011
              end
              format %td start
              format %td end
              After:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double(ID start end group)
              2 18080 18747 1
              2 18748 23011 1
              2 20702 21335 2
              2 21336 23011 2
              2 21329 23014 3
              end
              format %td start
              format %td end

              Comment


              • #8
                You can use joinby to create groups of consecutive start-end observations, then merge back with the original to recoup the unmatched observations. Here is a start:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double(ID start end)
                2 18080 18747 
                2 18748 23011 
                2 20702 21335 
                2 21336 23011 
                2 21329 23014 
                end
                format %td start
                format %td end
                
                preserve
                rename (start end) (start2 end2)
                tempfile holding
                save `holding'
                restore, preserve
                joinby ID using `holding'
                keep if end==start2-1
                rename (start end) (start1 end1)
                bys ID (start1): gen seq=_n
                reshape long start end, i(ID seq) j(which)
                by ID: gen group=sum(start[_n+1]== end+1)
                keep ID start end group
                The unmatched observations will each be in their own group.

                Res.:

                Code:
                . l
                
                     +------------------------------------+
                     | ID       start         end   group |
                     |------------------------------------|
                  1. |  2   02jul2009   30apr2011       1 |
                  2. |  2   01may2011   01jan2023       1 |
                  3. |  2   05sep2016   31may2018       2 |
                  4. |  2   01jun2018   01jan2023       2 |
                     +------------------------------------+

                Comment


                • #9
                  Andrew Musau thank you for your response. When I run this code and run -br- to browse the results, I only see the original data and I am unable to see the new changes/variables. Why does this occur?

                  Comment


                  • #10
                    It is something peculiar to do-files. If you have a preserve command active, it restores the dataset once you are done running the do-file. Replace

                    restore, preserve
                    in the fifth line of the code to


                    Code:
                    restore
                    The -preserve- option is not needed.

                    Comment


                    • #11
                      Works perfectly — thank you so much!

                      Comment

                      Working...
                      X