Hi everyone,
I struggle with a list of persons that can be uniquely identified by the variable "name". The persons are observed at different points in time ("date_str" / "date"). I want to generate a new variable that includes all values of the variable "classes" for a given person within the past 5 years. Exemplary data: see below.
For instance, in line 4 the new variable would have the value "C08K C12Q B65D G01N". In more general terms: I am trying to combine string-observations based on two conditions: (1) same "name" and (2) "date" has to be within the past 5 years of the focal date.
This might be related to a previous post: www.statalist.org/forums/forum/general-stata-discussion/general/1295115-how-to-summarize-multiple-observations-per-id. However, I struggle with adapting this: It is not enough to look at the previous line. The evaluation of dates has to consider the whole group defined by "name". [Repeating classes - e.g., "F22B H04Q F22B" - are not an issue: I can discard them afterwards.]
I'm thankful for any help or suggestions!
Patrick
I struggle with a list of persons that can be uniquely identified by the variable "name". The persons are observed at different points in time ("date_str" / "date"). I want to generate a new variable that includes all values of the variable "classes" for a given person within the past 5 years. Exemplary data: see below.
Code:
clear input str25 name str15 date_str str15 classes "Lastname 1, First name 1" "June 16, 2003" "F22B H04Q F04C" "Lastname 1, First name 1" "July 15, 2004" "B65D G01N" "Lastname 1, First name 1" "May 3, 2006" "C12Q" "Lastname 1, First name 1" "July 8, 2009" "C08K" "Lastname 2, First name 2" "April 5, 1999" "F16J B06B H04R" "Lastname 2, First name 2" "May 20, 2003" "F22B" "Lastname 2, First name 2" "April 2, 2007" "G01N" end gen date = date(date_str, "MDY") order name date_str date classes
For instance, in line 4 the new variable would have the value "C08K C12Q B65D G01N". In more general terms: I am trying to combine string-observations based on two conditions: (1) same "name" and (2) "date" has to be within the past 5 years of the focal date.
This might be related to a previous post: www.statalist.org/forums/forum/general-stata-discussion/general/1295115-how-to-summarize-multiple-observations-per-id. However, I struggle with adapting this: It is not enough to look at the previous line. The evaluation of dates has to consider the whole group defined by "name". [Repeating classes - e.g., "F22B H04Q F22B" - are not an issue: I can discard them afterwards.]
I'm thankful for any help or suggestions!
Patrick
Comment