Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching Parent's ID to Children's ID

    Hi, everyone! I am currently working on panel data concerning parental unemployment to children's happiness using UKHLS. However, I have problems merging datasets while matching parents' IDs with children's IDs. I have parents and their children in the ID column. What I would like to do is to match those children with their parents, by using the father and mother identification numbers (variables fatherID and motherID). I am using Stata 17.0

    Within every row of children's sample, I have both of their parent's unique ID (e.g. children's ID = 1003, their father's ID = 1001, mother's ID = 1002). I simply don't know how to merge these so I can have parental job status within the children's sample.

    The children's sample came from a different file (the youth survey) than the parents. Furthermore, the panel data is unbalanced and some ID's are missing (e.g. children's ID = 2002, their father's ID = 2001, mother's ID = inapplicable)

    Below is an example from the youth survey, which I would like to match with the parent's survey file. I used excel to create an example, of course the data I am using from UKHLS is different. I think using these will be more intuitive



    Here is an example from the adult survey


    Here is what I hope to get


    Here are some stylized notes about the data:
    1. There are more unemployed women with children than men, so it will be unbalanced.
    2. Most unemployed people do not live with their spouse
    3. People may change their partner/spouse, so a children could have different motherID or fatherID
    4. However, the most important thing is to have fatherjobstatus and motherjobstatus in children's sample which could be gathered by matching parental ID (father ID and mother ID) with children ID
    Thank you very much!
    Last edited by ryan nugraha; 09 Jun 2022, 06:16.

  • #2
    Try this:
    Code:
    use child_data, clear
    
    frame create adults
    frame change adults
    use adult_data
    preserve
    keep if sex == "male"
    isid id year, sort
    
    frame change default
    frlink m:1 fatherid year, frame(adults id year)
    frget father_jobstatus = job_status, from(adults)
    
    frame change adults
    restore
    keep if sex == "female"
    isid id year, sort
    
    frame change default
    drop adults
    frlink m:1 motherid year, frame(adults id year)
    frget mother_jobstatus = job_status, from(adults)
    Warning: Because you posted your example data as screenshots, which cannot be imported into Stata, it could not be used to test this program. Consequently there may be typos or other errors in the code.

    If you are unable to adapt this code to work with your actual Stata data, please post new example data from both the adult and child Stata data sets, this time using the -dataex- command, which will provide usable information. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Works like a charm! I had a few adjustments to the codes that you made. I was able to create the dataset that I hoped for.

      Thank you!

      Comment


      • #4
        Hello all,

        Using IHDS data, I am trying to match the grandfather in every household to the grandchild with the help of the identification of the father and person i.e., FATHERID and PERSONID respectively. For some household there are more than one grandfather in the household.
        I have tried matching the grandfather to the grandchild by creating a variable GF and then using the forval loop but it was not giving correct relations.
        Any suggestions on how to proceed?
        Here is an example of the data where IDHH is the unique household identification number, PERSONID is the identifier for each person in the family.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str10 IDHH int(PERSONID FATHERID SEX AGE RELHHHEAD)
        "0102010101"  1 69 1 57  1
        "0102010101"  2  . 0 49  2
        "0102010101"  3  1 1 19  3
        "0102010101"  4  1 0 14  3
        "0102010101"  5  1 1 29  3
        "0102010101"  6  . 0 26  4
        "0102010101"  7  5 1  7  5
        "0102010101"  8  5 0  4  5
        "0102010101"  9  5 1  2  5
        "0102010201"  1 75 1 76  1
        "0102010201"  2  1 1 42  3
        "0102010201"  3  . 0 40  4
        "0102010201"  4  2 1 20  5
        "0102010201"  5  2 0 19  5
        "0102010201"  6  2 1 13  5
        "0102010201"  7  1 1 37  3
        "0102010201"  8  . 0 33  4
        "0102010201"  9  7 1 13  5
        "0102010201" 10  7 0  8  5
        "0102010201" 11  7 0  6  5
        "0102010201" 12  1 1 31  3
        "0102010201" 13  . 0 26  4
        "0102010201" 14 12 0  8  5
        "0102010201" 15 12 1  3  5
        "0102010201" 16  . 1 65 12
        "0102010301"  1 69 1 45  1
        "0102010301"  2  . 0 43  2
        "0102010301"  3  1 0 23  3
        "0102010301"  4  1 1 20  3
        "0102010301"  5  1 0 17  3
        "0102010301"  6  1 1 14  3
        "0102010301"  7  1 0 12  3
        "0102010401"  1 75 1 57  1
        "0102010401"  2  . 0 47  2
        "0102010401"  3  1 0 20  3
        "0102010401"  4  1 0 14  3
        "0102010401"  5  1 1 13  3
        "0102010501"  1 75 1 50  1
        "0102010501"  2  . 0 38  2
        "0102010501"  3  1 1 27  3
        "0102010501"  4  . 0 25  4
        "0102010501"  5  3 0  1  5
        "0102010501"  6  1 1 23  3
        "0102010501"  7  . 0 21  4
        "0102010501"  8  1 1 18  3
        "0102010501"  9  1 1 16  3
        "0102010501" 10  1 1 12  3
        "0102010601"  1 75 1 72  1
        "0102010601"  2  1 1 39  3
        "0102010601"  3  . 0 29  4
        "0102010601"  4  2 0  4  5
        "0102010601"  5  2 0  1  5
        "0102010601"  6  1 1 37  3
        "0102010601"  7  . 0 33  4
        "0102010601"  8  6 1  7  5
        "0102010601"  9  6 1  3  5
        "0102010601" 10  1 1 27  3
        "0102010601" 11  . 0 24  4
        "0102010701"  1 75 1 52  1
        "0102010701"  2  . 0 42  2
        "0102010701"  3  1 0 26  3
        "0102010701"  4  1 1 24  3
        "0102010701"  5  1 0 20  3
        "0102010701"  6  1 0 18  3
        "0102010701"  7  1 0 16  3
        "0102010701"  8  1 0 15  3
        "0102010801"  1  . 1 42  1
        "0102010801"  2  . 0 37  2
        "0102010801"  3  1 0 18  3
        "0102010801"  4  1 0 14  3
        "0102010801"  5  1 1 13  3
        "0102010801"  6  1 1 12  3
        "0102010801"  7  1 1  3  3
        "0102010801"  8  1 1  0  3
        "0102010901"  1  . 0 48  1
        "0102010901"  2 75 1 21  3
        "0102010901"  3  . 0 19  4
        "0102010901"  4 75 1 20  3
        "0102010901"  5 75 1 17  3
        "0102010901"  6 75 1 14  3
        "0102010901"  7 75 1 14  3
        "0102010901"  8 75 1 12  3
        "0102010902"  1 75 1 27  1
        "0102010902"  2 69 0 25  2
        "0102010902"  3  1 1  4  3
        "0102010902"  4  1 1  1  3
        "0102011001"  1 75 1 37  1
        "0102011001"  2  . 0 30  2
        "0102011001"  3  1 1  3  3
        "0102011001"  4 75 1 33  7
        "0102011001"  5 75 1 31  7
        "0102011001"  6 75 1 29  7
        "0102011001"  7 75 1 24  7
        "0102011001"  8 75 0 22  7
        "0102011001"  9  . 0 45 10
        "0102011001" 10 75 1 25  9
        "0102011001" 11 75 0 19  9
        "0102011001" 12 75 1 18  9
        "0102011001" 13  . 0 67  6
        "0102011101"  1 75 1 31  1
        end
        label values FATHERID RO9
        label def RO9 69 "IF Spouse/Parent outside for more than 6 months", modify
        label def RO9 75 "Dead", modify
        label values SEX RO3
        label def RO3 0 "Female", modify
        label def RO3 1 "Male", modify
        label values RELHHHEAD RO4
        label def RO4 1 "Head 1", modify
        label def RO4 2 "Wife 2", modify
        label def RO4 3 "Daughter 3", modify
        label def RO4 4 "Daughter-in-law 4", modify
        label def RO4 5 "Grandchild 5", modify
        label def RO4 6 "Mother 6", modify
        label def RO4 7 "Sister 7", modify
        label def RO4 9 "Niece 9", modify
        label def RO4 10 "Sister in law 10", modify
        label def RO4 12 "Servant/Others 12", modify
        Many thanks for any assistance received.

        Comment


        • #5
          It appears you are looking to trace grandparents only on the paternal line since you state specifically you want to use the FATHERID variable for this, and there is no corresponding MOTHERID variable.

          In your example data, the PERSONID variable begins at 1 and counts up to as many people as are in the household, and no numbers are skipped. If that is true in the full data set, it makes this a very simple computation:
          Code:
          by IDHH (PERSONID), sort: assert PERSONID == _n
          by IDHH (PERSONID): gen GF = FATHERID[FATHERID]
          label values GF R09
          Note: The first command verifies that the PERSONID variable actually does begin at 1 and counts up with no numbers skipped. (If that is not true, you will get an error message and no GF variable will be calculated.)

          Comment


          • #6
            Thank you Sir..

            Comment

            Working...
            X