  • Identifying overlaps in a dataset

    Hi all,

    I am working on my MSc. Thesis and I need to perform an analysis on a large dataset that I have never done before. In particular, it is the following (here I paste a simplified example):
    Person Group Year
    Marc A 2014
    Claire B 2015
    Sylvia C 2015
    Marc B 2014
    Sylvia D 2015
    My objective is to identify whether the different persons belong to different groups in the same year. Here, we can see that Marc belongs to group A and B in 2014, and Sylvia to C and D in 2015. Since my database is large, I cannot eyeball it. My final goal is to identify what people belong to different groups in the same year, thus creating a new variable that should look as follows:
    Person Group Year Overlap
    Marc A 2014 2
    Claire B 2015 1
    Sylvia C 2015 2
    Marc B 2014 2
    Sylvia D 2015 2
    However, the "overlap" variable would not be a dummy variable. Instead, if for instance Marc belongs to 3 groups in 2014, under "overlap", I should see 3.

    Thank you very much for your help. If I did not explain myself well, I will be very happy to explain it again.

    Kind regards!


    See and especially p.563.

    * Example generated by -dataex-. For more info, type help dataex
    input str6 person str1 group int year
    "Marc"   "A" 2014
    "Claire" "B" 2015
    "Sylvia" "C" 2015
    "Marc"   "B" 2014
    "Sylvia" "D" 2015
    egen tag = tag(person group year)
    egen wanted = total(tag), by(person year)
    sort person year group
    list, sepby(person year)
         | person   group   year   tag   wanted |
      1. | Claire       B   2015     1        1 |
      2. |   Marc       A   2014     1        2 |
      3. |   Marc       B   2014     1        2 |
      4. | Sylvia       C   2015     1        2 |
      5. | Sylvia       D   2015     1        2 |
    With this example

    bysort person year : gen WANTED = _N
    yields the same result.
      Thanks a lot! I made a mistake and realized that what I need is something different, so I made a new post. Best regards

