Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Partner's information in household dataset

    Hi everyone,
    I am working on with a survey in which all adult members of the household participated individually. Now I want to generate a variable that captures the partner’s information (country of birth).
    The dataset contains two identifiers: the personal identifier (pid) and the partner’s identifier (parid). The two correspond, so it is possible to link couples. There is also a household identifier (hid). In the example below, we have a couple living in the same household, with a German-born and a foreign-born partner (germborn).

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(pid parid hid) byte germborn
    101    102    19 1
    101    102    19 1
    101    102    19 1
    101    102    19 1
    101    102    19 1
    101    102    19 1
    102    101    19 0
    102    101    19 0
    102    101    19 0
    102    101    19 0
    102    101    19 0
    102    101    19 0
    When parid is missing, it means that the person does not have a partner.
    I am not sure how I can generate a variable to capture the partner’s origin.
    Thank you!

  • #2
    rangestat from SSC can help here. You can search the forum for mentions.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(pid parid hid) byte germborn
    101    102    19 1
    101    102    19 1
    101    102    19 1
    101    102    19 1
    101    102    19 1
    101    102    19 1
    102    101    19 0
    102    101    19 0
    102    101    19 0
    102    101    19 0
    102    101    19 0
    102    101    19 0
    end 
    
    rangestat germpartner=germborn, int(pid parid parid) by(hid)
    
    list, sepby(hid pid)
    
         +-----------------------------------------+
         | pid   parid   hid   germborn   germpa~r |
         |-----------------------------------------|
      1. | 101     102    19          1          0 |
      2. | 101     102    19          1          0 |
      3. | 101     102    19          1          0 |
      4. | 101     102    19          1          0 |
      5. | 101     102    19          1          0 |
      6. | 101     102    19          1          0 |
         |-----------------------------------------|
      7. | 102     101    19          0          1 |
      8. | 102     101    19          0          1 |
      9. | 102     101    19          0          1 |
     10. | 102     101    19          0          1 |
     11. | 102     101    19          0          1 |
     12. | 102     101    19          0          1 |
         +-----------------------------------------+
    The by(hid) option may be redundant, or even a nuisance if partners can be recorded as living in different households.

    Comment


    • #3
      The advice given by Nick in #2 is excellent, and is also how I would approach this problem. But, if you are in a situation where you can't or won't install user-written commands, there is another way to do this using only native Stata commands:
      Code:
      frame put hid pid parid germborn, into(partners)
      frame partners: duplicates drop
      
      frlink m:1 hid parid, frame(partners hid pid)
      frget germborn, from(partners) prefix(partner_)
      drop partners
      frame drop partners
      This approach is worth knowing about in any case because there are cross-referencing situations like this that -rangestat- cannot handle, such as when the id variables are non-numeric or when the identification of the partner depends on multiple variables, not just one.

      Nick's remark about the role of hid applies equally here.

      Comment


      • #4
        I tried the first suggestion and it worked. Now I will check the second. Thank you so much for your quick and helpful responses!

        Comment

        Working...
        X