Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying overlaps in a dataset

    Hi all,

    I am working on my MSc. Thesis and I need to perform an analysis on a large dataset that I have never done before. In particular, it is the following (here I paste a simplified example):
    Person Group Year
    Marc A 2014
    Claire B 2015
    Sylvia C 2015
    Marc B 2014
    Sylvia D 2015
    My objective is to identify whether the different persons belong to different groups in the same year. Here, we can see that Marc belongs to group A and B in 2014, and Sylvia to C and D in 2015. Since my database is large, I cannot eyeball it. My final goal is to identify what people belong to different groups in the same year, thus creating a new variable that should look as follows:
    Person Group Year Overlap
    Marc A 2014 2
    Claire B 2015 1
    Sylvia C 2015 2
    Marc B 2014 2
    Sylvia D 2015 2
    However, the "overlap" variable would not be a dummy variable. Instead, if for instance Marc belongs to 3 groups in 2014, under "overlap", I should see 3.

    Thank you very much for your help. If I did not explain myself well, I will be very happy to explain it again.


    Kind regards!

    Carla

  • #2
    See https://www.stata-journal.com/articl...article=dm0042 and especially p.563.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 person str1 group int year
    "Marc"   "A" 2014
    "Claire" "B" 2015
    "Sylvia" "C" 2015
    "Marc"   "B" 2014
    "Sylvia" "D" 2015
    end
    
    egen tag = tag(person group year)
    egen wanted = total(tag), by(person year)
    sort person year group
    
    list, sepby(person year)
    
         +--------------------------------------+
         | person   group   year   tag   wanted |
         |--------------------------------------|
      1. | Claire       B   2015     1        1 |
         |--------------------------------------|
      2. |   Marc       A   2014     1        2 |
      3. |   Marc       B   2014     1        2 |
         |--------------------------------------|
      4. | Sylvia       C   2015     1        2 |
      5. | Sylvia       D   2015     1        2 |
         +--------------------------------------+
    With this example

    Code:
    bysort person year : gen WANTED = _N
    yields the same result.
    Last edited by Nick Cox; 04 Apr 2022, 06:20.

    Comment


    • #3
      Thanks a lot! I made a mistake and realized that what I need is something different, so I made a new post. Best regards

      Comment

      Working...
      X