Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating groups by sets of consecutive numbers

    I am trying to create a new id variable in a dataset by id and a set of consecutive numbers. For example:

    Code:
    clear all
    input str1(id) byte(numb)
    a 0
    b 0
    b 0
    b 1
    b 2
    b 3
    b 0
    b 1
    c 1
    c 2
    c 3
    d 0
    e 1
    end
    I'm looking to create a group variable that would look something like:

    Code:
    id    numb    group
    a    0    1
    b    0    2
    b    0    3
    b    1    3
    b    2    3
    b    3    3
    b    0    4
    b    1    4
    c    1    5
    c    2    5
    c    3    5
    d    0    6
    e    1    7
    Identifying observations that are consecutive is not the problem, but I can't figure out how to increment the group variable within an id group. E.g.

    Code:
    gen group = .
    egen id_group = group(id)
    su id_group, meanonly
    local j = 1
    forvalues i = 1/`r(max)' {
    
        replace group = `j' if (numb[_n] - numb[_n-1] == 1 & id[_n] == id[_n-1] & id_group == `i') ///
            | (numb[_n+1] - numb[_n] == 1 & id[_n+1] == id[_n] & id_group == `i')
        local j = `j' + 1
        
    }
    replace group = 10 + _n if missing(group) // replace single obs groups by some big number + _n
    get some of the way, but doesn't distinguish between the two groups of consecutive numbers within id == "b", and produces:

    Code:
    id    numb    id_group    group
    a    0    1    11
    b    0    2    12
    b    0    2    2
    b    1    2    2
    b    2    2    2
    b    3    2    2
    b    0    2    2
    b    1    2    2
    c    1    3    3
    c    2    3    3
    c    3    3    3
    d    0    4    22
    e    1    5    23
    Last edited by Joost Sijthoff; 03 Apr 2022, 16:53.

  • #2
    I have stared at this for a while trying to figure out what it is you want. The only pattern I could discern is this: start at 1 and work through the observations. When the next observation either has a new value for id or has numb = 0, increment the value of group by 1, otherwise repeat the same value of group. Is that what you want? If so:
    Code:
    gen group = sum(id != id[_n-1] | numb == 0)

    Comment


    • #3
      Apologies for not being clearer.

      Essentially, I am trying to group consecutive numbers within an id group to create a new group variable.

      E.g. if there would be rows within one id value,
      Code:
      id    numb
      b    1
      b    1
      because these are not consecutive, I am looking for them to be in separate groups. In your suggestion these would get the same value.

      The use of the sum of conditions I hadn't thought about, that's a good idea that maybe could work in another way.

      EDIT:
      Using the sum, this works I think:
      Code:
      gen i = sum(numb[_n] != numb[_n-1] + 1)
      egen group = group(id i)
      I think that solves it!
      Last edited by Joost Sijthoff; 03 Apr 2022, 17:15.

      Comment


      • #4
        I see. Yes, that would do it. You could also do it a little more simply with:
        Code:
        gen i = sum(id != id[_n-1] | numb != numb[_n-1]+1)

        Comment

        Working...
        X