Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative count of unique observations within ID and date

    Hi,

    There is probably a very easy solution to this, but I simply cannot figure out how to do it. I have searched the forum without luck.
    I need to generate a variable (cycle_number) that counts numbers of new chemotherapy cycles within each patient (defined by a change in the date).

    This is an example of what I have:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id date)
    1 18597
    1 18598
    1 18598
    1 18599
    1 18600
    2 20259
    2 20259
    2 20259
    2 20266
    2 20266
    end
    format %td date
    This is what I want to obtain:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id date cycle_number)
    1 18597 1
    1 18598 2
    1 18598 2
    1 18599 3
    1 18600 4
    2 20259 1
    2 20259 1
    2 20259 1
    2 20266 2
    2 20266 2
    end
    format %td date
    Thank you for the help.



  • #2
    Code:
    bysort id (date) : gen wanted = sum(date != date[_n-1])
    The most common keywords here are those of identifying spells. There was a paper on this in the Stata Journal in 2007 that remains pertinent.

    Comment


    • #3
      Code:
      . bysort id date : gen wanted = _n == 1
      
      . list , sepby(id)
      
           +-------------------------+
           | id        date   wanted |
           |-------------------------|
        1. |  1   01dec2010        1 |
        2. |  1   02dec2010        1 |
        3. |  1   02dec2010        0 |
        4. |  1   03dec2010        1 |
        5. |  1   04dec2010        1 |
           |-------------------------|
        6. |  2   20jun2015        1 |
        7. |  2   20jun2015        0 |
        8. |  2   20jun2015        0 |
        9. |  2   27jun2015        1 |
       10. |  2   27jun2015        0 |
           +-------------------------+
      
      . by     id        : replace wanted = sum(wanted)
      (8 real changes made)
      
      . list , sepby(id)
      
           +-------------------------+
           | id        date   wanted |
           |-------------------------|
        1. |  1   01dec2010        1 |
        2. |  1   02dec2010        2 |
        3. |  1   02dec2010        2 |
        4. |  1   03dec2010        3 |
        5. |  1   04dec2010        4 |
           |-------------------------|
        6. |  2   20jun2015        1 |
        7. |  2   20jun2015        1 |
        8. |  2   20jun2015        1 |
        9. |  2   27jun2015        2 |
       10. |  2   27jun2015        2 |
           +-------------------------+
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thank you Nick Cox and Maarten Buis,

        Both solutions worked perfectly.

        Best

        Comment


        • #5
          The allusion in #2 was to Stata Journal | Article (stata-journal.com)

          Note that the two answers are really the same method, as the first observation in a spell necessarily has a different date from the previous observation. (This works for the first observation too, as then the previous value is date[0] which isn't in the dataset, but Stata returns missing as saying "I don't know what that is", which is fine too for this purpose.

          Comment

          Working...
          X