Cumulative count of unique observations within ID and date

Lau Amdi

Join Date: Apr 2019

Posts: 23
#1

Cumulative count of unique observations within ID and date

02 Feb 2023, 03:43

Hi,

There is probably a very easy solution to this, but I simply cannot figure out how to do it. I have searched the forum without luck.
I need to generate a variable (cycle_number) that counts numbers of new chemotherapy cycles within each patient (defined by a change in the date).

This is an example of what I have:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id date) 1 18597 1 18598 1 18598 1 18599 1 18600 2 20259 2 20259 2 20259 2 20266 2 20266 end format %td date

This is what I want to obtain:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id date cycle_number) 1 18597 1 1 18598 2 1 18598 2 1 18599 3 1 18600 4 2 20259 1 2 20259 1 2 20259 1 2 20266 2 2 20266 2 end format %td date

Thank you for the help.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35433
#2

02 Feb 2023, 03:54

Code:

bysort id (date) : gen wanted = sum(date != date[_n-1])

The most common keywords here are those of identifying spells. There was a paper on this in the Stata Journal in 2007 that remains pertinent.
2 likes
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3426

02 Feb 2023, 03:55

Code:

. bysort id date : gen wanted = _n == 1

. list , sepby(id)

     +-------------------------+
     | id        date   wanted |
     |-------------------------|
  1. |  1   01dec2010        1 |
  2. |  1   02dec2010        1 |
  3. |  1   02dec2010        0 |
  4. |  1   03dec2010        1 |
  5. |  1   04dec2010        1 |
     |-------------------------|
  6. |  2   20jun2015        1 |
  7. |  2   20jun2015        0 |
  8. |  2   20jun2015        0 |
  9. |  2   27jun2015        1 |
 10. |  2   27jun2015        0 |
     +-------------------------+

. by     id        : replace wanted = sum(wanted)
(8 real changes made)

. list , sepby(id)

     +-------------------------+
     | id        date   wanted |
     |-------------------------|
  1. |  1   01dec2010        1 |
  2. |  1   02dec2010        2 |
  3. |  1   02dec2010        2 |
  4. |  1   03dec2010        3 |
  5. |  1   04dec2010        4 |
     |-------------------------|
  6. |  2   20jun2015        1 |
  7. |  2   20jun2015        1 |
  8. |  2   20jun2015        1 |
  9. |  2   27jun2015        2 |
 10. |  2   27jun2015        2 |
     +-------------------------+

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Lau Amdi

Join Date: Apr 2019

Posts: 23
#4

02 Feb 2023, 04:06

Thank you Nick Cox and Maarten Buis,

Both solutions worked perfectly.

Best
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#5

02 Feb 2023, 06:56

The allusion in #2 was to Stata Journal | Article (stata-journal.com)

Note that the two answers are really the same method, as the first observation in a spell necessarily has a different date from the previous observation. (This works for the first observation too, as then the previous value is date[0] which isn't in the dataset, but Stata returns missing as saying "I don't know what that is", which is fine too for this purpose.
1 like
Comment

Announcement

Cumulative count of unique observations within ID and date

Comment

Comment

Comment

Comment