Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge and concatenate string using a group condition

    I can't seem to find a solution for my problem, which is expressed in the following MWE:

    I have data on accounts, jobs per account, date, and a string variable. e.g.

    Code:
    clear 
    input account job date str20 text
    100 200 22640 "string a"
    100 201 22645 "string b"
    100 202 22670 "string c"
    100 203 22690 "string d"
    end
    I need to merge all jobs within a specific account, for which the time range is less than a week (for example), without losing the text column.
    I also want information on how many rows were merged.
    In my example, the first two rows should be merged and concatenated (since their dates are within a week), but the last two rows do not change.
    The data needs to look like this after the loop:

    Code:
    clear 
    input account job date str20 text rows_merged
    100 200 22640 "string a, string b" 2
    100 202 22670 "string c" 1
    100 203 22690 "string d" 1
    end
    Stata/MP 15.1

  • #2
    What happens if your data looked like this:
    Code:
    clear 
    input account job date str20 text
    100 200 22640 "string a"
    100 201 22645 "string b"
    100 202 22650 "string c"
    100 203 22690 "string d"
    end
    The first job is within a week of the second job, and the second job is within a week of the third job.

    Comment


    • #3
      They should all be merged
      Stata/MP 15.1

      Comment


      • #4
        Code:
        clear
        input account job date str20 text
        100 200 22640 "string a"
        100 201 22645 "string b"
        100 202 22650 "string c"
        100 203 22690 "string d"
        end
        
        by account (date), sort: gen group = sum(date - date[_n-1] > 7 )
        
        by account group (date), sort: gen rows_merged = _N
        by account group (date): replace text = text[_n-1] ///
            + "; " + text if !missing(text, text[_n-1]) & _n > 1
        by account group (date): replace date = date[1] if _n == _N
        by account group: keep if _n == _N
        drop group
        
        list, noobs clean

        Comment


        • #5
          That is perfect Clyde. Thanks
          Stata/MP 15.1

          Comment

          Working...
          X