Generating groups by sets of consecutive numbers

Joost Sijthoff

Join Date: Sep 2018
Posts: 5

Generating groups by sets of consecutive numbers

03 Apr 2022, 15:40

I am trying to create a new id variable in a dataset by id and a set of consecutive numbers. For example:

Code:

clear all
input str1(id) byte(numb)
a 0
b 0
b 0
b 1
b 2
b 3
b 0
b 1
c 1
c 2
c 3
d 0
e 1
end

I'm looking to create a group variable that would look something like:

Code:

id    numb    group
a    0    1
b    0    2
b    0    3
b    1    3
b    2    3
b    3    3
b    0    4
b    1    4
c    1    5
c    2    5
c    3    5
d    0    6
e    1    7

Identifying observations that are consecutive is not the problem, but I can't figure out how to increment the group variable within an id group. E.g.

Code:

gen group = .
egen id_group = group(id)
su id_group, meanonly
local j = 1
forvalues i = 1/`r(max)' {

    replace group = `j' if (numb[_n] - numb[_n-1] == 1 & id[_n] == id[_n-1] & id_group == `i') ///
        | (numb[_n+1] - numb[_n] == 1 & id[_n+1] == id[_n] & id_group == `i')
    local j = `j' + 1
    
}
replace group = 10 + _n if missing(group) // replace single obs groups by some big number + _n

get some of the way, but doesn't distinguish between the two groups of consecutive numbers within id == "b", and produces:

Code:

id    numb    id_group    group
a    0    1    11
b    0    2    12
b    0    2    2
b    1    2    2
b    2    2    2
b    3    2    2
b    0    2    2
b    1    2    2
c    1    3    3
c    2    3    3
c    3    3    3
d    0    4    22
e    1    5    23

Last edited by Joost Sijthoff; 03 Apr 2022, 15:53.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

03 Apr 2022, 15:55

I have stared at this for a while trying to figure out what it is you want. The only pattern I could discern is this: start at 1 and work through the observations. When the next observation either has a new value for id or has numb = 0, increment the value of group by 1, otherwise repeat the same value of group. Is that what you want? If so:

Code:

gen group = sum(id != id[_n-1] | numb == 0)
Comment
Joost Sijthoff

Join Date: Sep 2018

Posts: 5
#3

03 Apr 2022, 16:11

Apologies for not being clearer.

Essentially, I am trying to group consecutive numbers within an id group to create a new group variable.

E.g. if there would be rows within one id value,

Code:

id numb b 1 b 1

because these are not consecutive, I am looking for them to be in separate groups. In your suggestion these would get the same value.

The use of the sum of conditions I hadn't thought about, that's a good idea that maybe could work in another way.

EDIT:
Using the sum, this works I think:

Code:

gen i = sum(numb[_n] != numb[_n-1] + 1) egen group = group(id i)

I think that solves it!

Last edited by Joost Sijthoff; 03 Apr 2022, 16:15.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

03 Apr 2022, 18:23

I see. Yes, that would do it. You could also do it a little more simply with:

Code:

gen i = sum(id != id[_n-1] | numb != numb[_n-1]+1)
Comment

Announcement

Generating groups by sets of consecutive numbers

Comment

Comment

Comment