Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to create unique identifiers for each observation

    Hi there, I am analyzing some data at the dyadic level. There are multiple matched dyads of actors and partners, how can I create unique identifiers (IDs) for each actor, partner, and dyad?

    the raw data goes like this:




    teamID actorName partnerName
    1 Adam Raddi
    1 Adam Samatha
    1 Adam JoJo
    1 Raddi Adam
    1 Raddi Samatha
    1 Raddi JoJo
    1 Samatha Adam
    1 Samatha Raddi
    1 Samatha JoJo
    1 JoJo Adam
    1 JoJo Raddi
    1 JoJo Samatha
    2 Nix Kim
    2 Nix Susan
    2 Kim Nix
    2 Kim Susan
    2 Susan Nix
    2 Susan Kim

    I would like to add three ID columns to represent actors, partners, and their dyads. There are a few requirements:

    1. each actor and partner should have a unique ID.

    2. the actor_ID should be corresponding to the partner_ID. for example, when "Adam" is the actor, he is assigned with "1" in the actor_ID, thus, in the partner_ID, the "partner" "Adam" should be also assigned "1". The example of this requirement is in red in the following table.

    3. each dyad should have a unique ID. This may be the hardest part. For example, the dyad_ID should be "1" for both the combinations of [ Adam" (the actor) and "Raddi" (the partner) ] and [ "Raddi" (the actor) and "Adam" (the partner) ]. The example of this requirement is in blue in the following table.

    The ideal new table is as below, does anyone can help me?




    teamID actorName partnerName actor_ID partner_ID dyad_ID
    1 Adam Raddi 1 2 1
    1 Adam Samatha 1 3 2
    1 Adam JoJo 1 4 3
    1 Raddi Adam 2 1 1
    1 Raddi Samatha 2 3 4
    1 Raddi JoJo 2 4 5
    1 Samatha Adam 3 1 2
    1 Samatha Raddi 3 2 4
    1 Samatha JoJo 3 4 6
    1 JoJo Adam 4 1 3
    1 JoJo Raddi 4 2 5
    1 JoJo Samatha 4 3 6
    2 Nix Kim 5 6 7
    2 Nix Susan 5 7 8
    2 Kim Nix 6 5 7
    2 Kim Susan 6 7 9
    2 Susan Nix 7 5 8
    2 Susan Kim 7 6 9

  • #2
    Assuming that the number of actors/partners and dyads is less than 65,536, the following code will work:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte eamid str7(actorname partnername)
    1 "Adam"    "Raddi"  
    1 "Adam"    "Samatha"
    1 "Adam"    "JoJo"  
    1 "Raddi"   "Adam"  
    1 "Raddi"   "Samatha"
    1 "Raddi"   "JoJo"  
    1 "Samatha" "Adam"  
    1 "Samatha" "Raddi"  
    1 "Samatha" "JoJo"  
    1 "JoJo"    "Adam"  
    1 "JoJo"    "Raddi"  
    1 "JoJo"    "Samatha"
    2 "Nix"     "Kim"    
    2 "Nix"     "Susan"  
    2 "Kim"     "Nix"    
    2 "Kim"     "Susan"  
    2 "Susan"   "Nix"    
    2 "Susan"   "Kim"    
    end
    
    encode actorname, gen(actor_id) label(person_id)
    encode partnername, gen(partner_id) label(person_id)
    label values actor_id
    label values partner_id
    
    gen dyad = actorname + ", " + partnername if actorname <= partnername
    replace dyad = partnername + ", " + actorname if missing(dyad)
    encode dyad, gen(dyad_id)
    label values dyad_id
    By the way, the -label values- commands here are optional. I've put them in so that you can inspect the numeric coding directly. But you might find it easier to work with your data if you leave them labeled, so that even though they are still 1, 2, 3,... etc. they look like Adam, JoJo, Raddi, etc. (And if you do that you can probably drop the string variables actorname partnername and dyad which convey no additional information and just take up space in memory.)

    Added: I have just noticed that this is a duplicate post, which was already asked and answered at https://www.statalist.org/forums/for...ch-observation. The solution there is slightly different, drawing on a community-contributed program -multencode-, but the basic idea and results are the same.

    Also added: In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Last edited by Clyde Schechter; 22 Mar 2018, 08:26.

    Comment


    • #3
      thanks Clyde! problem solved! and sorry for the duplicate post...

      Comment

      Working...
      X