I am trying to create a new id variable in a dataset by id and a set of consecutive numbers. For example:
I'm looking to create a group variable that would look something like:
Identifying observations that are consecutive is not the problem, but I can't figure out how to increment the group variable within an id group. E.g.
get some of the way, but doesn't distinguish between the two groups of consecutive numbers within id == "b", and produces:
Code:
clear all input str1(id) byte(numb) a 0 b 0 b 0 b 1 b 2 b 3 b 0 b 1 c 1 c 2 c 3 d 0 e 1 end
Code:
id numb group a 0 1 b 0 2 b 0 3 b 1 3 b 2 3 b 3 3 b 0 4 b 1 4 c 1 5 c 2 5 c 3 5 d 0 6 e 1 7
Code:
gen group = . egen id_group = group(id) su id_group, meanonly local j = 1 forvalues i = 1/`r(max)' { replace group = `j' if (numb[_n] - numb[_n-1] == 1 & id[_n] == id[_n-1] & id_group == `i') /// | (numb[_n+1] - numb[_n] == 1 & id[_n+1] == id[_n] & id_group == `i') local j = `j' + 1 } replace group = 10 + _n if missing(group) // replace single obs groups by some big number + _n
Code:
id numb id_group group a 0 1 11 b 0 2 12 b 0 2 2 b 1 2 2 b 2 2 2 b 3 2 2 b 0 2 2 b 1 2 2 c 1 3 3 c 2 3 3 c 3 3 3 d 0 4 22 e 1 5 23
Comment