Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching Children ID to Parental ID with Birth Orders

    Hi, I'm currently working on longitudinal data where the analysis is for parents. However, the component of children's birth date and birth order is important. I'm using UKHLS. The ideal structure is number two:

    Click image for larger version

Name:	Screenshot 2024-02-04 at 20.43.06.png
Views:	1
Size:	64.8 KB
ID:	1742074


    The parents have their dataset (but we don't know their children's ID in this dataset) and the children's dataset has parental IDs. But I'm afraid merging directly will make duplicates (like Data Structure 1).

    Here's the data for the parental dataset, which includes parental ID (pidp), household ID (hidp), and father/mother (sex). Note that this dataset has already been transformed into a longitudinal format.

    Code:
    input double pidp long hidp byte sex
      687         . .
     1367         . .
     2051         . .
     2727         . .
     3407         . .
     4091         . .
     4767         . .
     5451         . .
     6135         . .
     6807         . .
     6882         . .
     7487         . .
     8167         . .
     8847         . .
     9527         . .
    10211         . .
    10895         . .
    11579         . .
    12251         . .
    12935         . .
    13607         . .
    13722         . .
    14287         . .
    14971         . .
    15645         . .
    15655         . .
    16339         . .
    17015         . .
    17687         . .
    18371         . .
    19047         . .
    19731         . .
    20415         . .
    20482         . .
    21087         . .
    21767         . .
    22445 277059218 2
    22445 276637622 2
    22451         . .
    23127         . .
    23807         . .
    24487         . .
    25167         . .
    25847         . .
    26527         . .
    27211         . .
    27284         . .
    27322         . .
    27895         . .
    28575         . .
    29259         . .
    29947         . .
    30615         . .
    31287         . .
    31967         . .
    32651         . .
    33327         . .
    34007         . .
    34084         . .
    34122         . .
    34691         . .
    35375         . .
    36059         . .
    36727         . .
    37411         . .
    38087         . .
    38771         . .
    39455         . .
    40139         . .
    40823         . .
    40922         . .
    40924         . .
    41487         . .
    42171         . .
    42847         . .
    43527         . .
    44211         . .
    44887         . .
    45571         . .
    46247         . .
    46927         . .
    47619         . .
    47762         . .
    47764         . .
    48303         . .
    48967         . .
    49651         . .
    50335         . .
    51007         . .
    51691         . .
    52367         . .
    53051         . .
    53735         . .
    54419         . .
    54482         . .
    54524         . .
    55087         . .
    55775         . .
    56447         . .
    57131         . .
    end
    label values sex a_sex
    label def a_sex 2 "female", modify
    Here's the children's dataset. which includes children's ID (pidp), household ID (hidp), biological father (fnpid), biological mother (mnpid), and date of birth (doby_dv). Note this is an example of a single wave, not a longitudinal form:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(pidp a_hidp a_fnpid a_mnpid) int a_doby_dv
    68006139 68006123       -8 68006127 1994
    68014295 68014283       -8 68014287 2000
    68014299 68014283       -8 68014287 2004
    68017695 68017683 68017691 68017687 2000
    68017699 68017683 68017691 68017687 2007
    68019055 68019043 68019047 68019051 2005
    68026539 68026523 68026531 68026527 1998
    68026543 68026523 68026531 68026527 2006
    68028579 68028563 68028571 68028567 1993
    68028583 68028563 68028571 68028567 1997
    68028587 68028563 68028571 68028567 1999
    68029935 68029923 68029931 68029927 1996
    68029939 68029923 68029931 68029927 2003
    68030611 68030603       -8 68030607 2007
    68037411 68037403       -8 68037407 1998
    68041495 68041483 68041491 68041487 2000
    68041499 68041483 68041491 68041487 2003
    68042175 68042163 68042167 68042171 2006
    68044215 68044203 68044211 68044207 2005
    68044219 68044203 68044211 68044207 2007
    68045575 68045563       -8 68045567 1996
    68046935 68046923 68046931 68046927 2000
    68046939 68046923 68046931 68046927 2004
    68056455 68056443 68056447 68056451 1994
    68056459 68056443 68056447 68056451 1999
    68056463 68056443 68056447 68056451 2003
    68060535 68060523 68060527 68060531 2005
    68063255 68063243 68063251 68063247 1999
    68063935 68063923 68063931 68063927 2003
    68066655 68066643 68066651 68066647 2005
    68068019 68068003 68068007 68068011 1996
    68072095 68072083       -8 68072091 2000
    68072775 68072763 68072771 68072767 1999
    68072779 68072763 68072771 68072767 2002
    68073459 68073443 68073447 68073451 1994
    68073463 68073443 68073447 68073451 1996
    68086375 68086363       -8 68086371 2003
    68091131 68091123       -8 68091127 1995
    68091135 68091123       -8 68091127 1997
    68091139 68091123       -8 68091127 1998
    68093851 68093843       -8 68093847 1995
    68093855 68093843       -8 68093847 1999
    68093859 68093843       -8 68093847 2005
    68096571 68096563       -8 68096567 2006
    68102695 68102683       -8       -8 1994
    68102699 68102683       -8       -8 1998
    68107455 68107443 68107447 68107451 2001
    68107459 68107443 68107447 68107451 2006
    68108139 68108123 68108127 68108131 1994
    68112215 68112203 68112207 68112211 2007
    68114255 68114243 68114251 68114247 1993
    68115619 68115603 68115611 68115607 1996
    68115623 68115603 68115611 68115607 1999
    68116295 68116283 68116291 68116287 2001
    68116299 68116283 68116291 68116287 2006
    68121055 68121043 68121051 68121047 1995
    68121059 68121043 68121051 68121047 1999
    68122415 68122403       -8 68122407 1995
    68125135 68125123       -8 68125127 1995
    68126495 68126483 68126487 68126491 2008
    68128535 68128523 68128531 68128527 2002
    68128539 68128523 68128531 68128527 2005
    68132615 68132603 68132607 68132611 1995
    68139415 68139403 68139407 68139411 2003
    68141459 68141443 68141451 68141447 2000
    68142139 68142123       -8 68142127 1999
    68142143 68142123 68142131 68142127 2003
    68144855 68144843 68144847 68144851 2005
    68153015 68153003 68153011 68153007 1999
    68153695 68153683 68153687 68153691 2007
    68153699 68153683 68153687 68153691 2001
    68153703 68153683 68153687 68153691 2003
    68157771 68157763       -8 68157767 1993
    68158455 68158443 68158451 68158447 2002
    68158459 68158443 68158451 68158447 1998
    68159135 68159123 68159127 68159131 2004
    68167975 68167963 68167971 68167967 2002
    68167979 68167963 68167971 68167967 2005
    68180215 68180203 68180207 68180211 1998
    68180219 68180203 68180207 68180211 2000
    68180895 68180883       -8 68180887 1999
    68180899 68180883 68180891 68180887 2006
    68182935 68182923       -8 68182927 1996
    68184975 68184963 68184967 68184971 2008
    68185651 68185643       -8 68185647 1997
    68188375 68188363 68188371 68188367 1995
    68191775 68191763       -8 68191771 2002
    68197215 68197203 68197207 68197211 1997
    68197219 68197203 68197207 68197211 2000
    68197223 68197203 68197207 68197211 2001
    68197899 68197883       -8 68197887 1994
    68197903 68197883       -8 68197887 1997
    68208095 68208083 68208091 68208087 2006
    68208099 68208083 68208091 68208087 2008
    68209455 68209443 68209451 68209447 2005
    68209459 68209443 68209451 68209447 2008
    68212855 68212843 68212847 68212851 1997
    68212859 68212843 68212847 68212851 2001
    68212863 68212843 68212847 68212851 2005
    68223735 68223723 68223727 68223731 2008
    end
    label values a_fnpid a_fnpid
    label def a_fnpid -8 "inapplicable", modify
    label values a_mnpid a_mnpid
    label def a_mnpid -8 "inapplicable", modify
    label values a_doby_dv a_doby_dv
    Thank you!

    Attached Files

  • #2
    These data sets will need some attention before you can do what you want. The overall strategy is simple: merge the children's data set with the adult data set twice, once linking the fathers and again linking the mothers. This will produce what you call Data Structure 1. Then with a little tinkering, you can use -reshape wide- to get the data structure you want. The overall code will look something like this:
    Code:
    use children, clear
    rename pidp c_pidp
    rename a_hidp hidp
    tempfile holding
    save `holding'
    
    use parents
    isid pidp hidp, sort missok
    clonevar a_fnpid = pidp
    merge 1:m a_fnpid hidp using `holding', keep(match master) ///
        nogenerate keepusing(c_pidp a_doby_dv)
    drop a_fnpid
    clonevar a_mnpid = pidp
    merge 1:m a_mnpid hidp using `holding', keep(match master) ///
        nogenerate keepusing(c_pidp a_doby_dv)
    drop a_mnpid
    
    by pidp hidp sex (a_doby_dv), sort: gen int seq = _n
    reshape wide c_pidp a_doby_dv, i(pidp hidp sex) j(seq)
    But there are a few problems. The pidp variable does not uniquely identify observations in the adult data set, you need the combination of pidp and hidp. But hidp is missing in the majority of observations. While this is legal, and might work out OK (it does in the example data) it's not a good idea for missing value to be valid in an identifier variable. The variable a_fnpid has a fair number of observations coded -8. I suppose this is some sort of magic number code for missing value. While, again, not illegal, it's a bad idea in Stata and is setting you up to subsequently make mistakes. If this is indeed a missing value code, you should replace it with a real Stata missing value (either system missing or one of the extended missing values). You refer to birth order as a critical variable, but it is not present in the children's data set. You do have a variable a_doby_dv1 that looks like it might be birth year (and the code shown above assumes it is precisely that), but you can have two children born to the same parents in the same year, and this leaves their birth order indeterminate (and, if you try to calculate it, randomized). Also, I note that id numbers in the children's data set are all 8 digit numbers, whereas the pidp numbers range between 3 and 5 digits. Consequently, in the example data, no child matches up with any parent. Perhaps in the full data sets, both parent and child id numbers cover the same range. But the examples make me wonder if somehow the pidps in the adult data set have been recoded in some way that has not been applied to the childrens id variables.

    Additionally, I wonder on what basis you believe that data structure 2 is what you need, considering that for most things in Stata, what you call data structure 1 will make life much easier. Yes, there are situations where data structure 2 is preferable, but they are the minority. Unless you know specifically that what you are doing will be better with data structure 2, you might reconsider your plans.

    Finally, as an aside, it is best to avoid using initialisms like UKHLS here. This is an international, interdisciplinary forum. I have no idea what the UKHLS is, and I suspect that is true of many, perhaps most Forum participants. It doesn't appear to be crucial to solving your problem (though perhaps it would have pre-empted some of my questions about the data raised earlier). But when it's not crucial, it's best to either not identify the data source, or if you do identify it, spell out the full name and give a brief explanation of what it is. In this Forum, the only things that should be considered common knowledge are those things that any university graduate anywhere in the world would know, and perhaps some very basic understanding of Stata and statistics. It's to your own advantage to follow this guideline, because seeing something unidentifiable early in a post may well cause somebody who could and would answer your question to skip over it.

    Comment

    Working...
    X