Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find number of children, age at birth for women using identifiers

    Hi:

    I am working with a dataset that looks something like this:

    HHID PID MID Rel_Head Age
    10 1 3 1 56
    10 2 . 2 48
    10 3 . 7 75
    10 4 . 8 80
    10 5 2 6 18
    10 6 2 6 16
    10 7 3 5 52
    10 8 3 5 49
    12 1 . 1 25
    12 2 . 2 24


    where HHID is household identifier; PID is member identifier within each household; MID is the identifier for the mother; Rel_Head is the relationship to the head of the household (it's 1 if individual is head; 2 if they are spouse of head; 6 if child of head; 8 if father of head and so on).

    For instance, in the above dataset, PID-5 and 6 are children of 1 and 2 in household 10.

    I want to construct two variables from this dataset:

    1. The number of children for each mother in the household;
    2. The age at first birth for each mother. In other words, difference between her age and her oldest offspring's age.

    Basically, I want to have two variables corresponding to the last two columns below:

    HHID PID MID Rel_Head Age N_children Age_f_birth
    10 1 3 1 56
    10 2 . 2 48 2 30
    10 3 . 7 75 2 19
    10 4 . 8 80
    10 5 2 6 18
    10 6 2 6 16
    10 7 3 5 52
    10 8 3 5 49
    12 1 . 1 25
    12 2 . 2 24



    Any help would be immensely appreciated! Thank you!

    Last edited by Devon Smith; 28 Sep 2021, 21:39.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(hhid pid mid rel_head age)
    10 1 3 1 56
    10 2 . 2 48
    10 3 . 7 75
    10 4 . 8 80
    10 5 2 6 18
    10 6 2 6 16
    10 7 3 5 52
    10 8 3 5 49
    12 1 . 1 25
    12 2 . 2 24
    end
    
    preserve
    rename (pid-age) =_U
    keep if !missing(mid)
    tempfile kids
    save `kids'
    restore
    joinby hhid using `kids', unmatched(master)
    by hhid pid, sort: egen n_children = total(mid_U == pid)
    by hhid pid: egen age_first_birth = min(cond(mid_U == pid, age-age_U, .))
    by hhid pid: keep if _n == 1
    drop *_U
    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you so much! I will use dataex in the future.

      Comment


      • #4
        Hi Clyde:

        I had a follow-up question. I am using the same dataset that I mentioned above but now I have two additional variables: years of schooling and a dummy variable for whether the household member is immunized.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(hhid pid mid rel_head age schooling immunized)
        10 1 3 1 56 16 1
        10 2 . 2 48 16 0
        10 3 . 7 75 8  1
        10 4 . 8 80 14 1
        10 5 2 6 18 12 1
        10 6 2 6 16 10 1
        10 7 3 5 52 15 0
        10 8 3 5 49 .  .
        12 1 . 1 25 12 1
        12 2 . 2 24 12 1
        end
        Now I want to generate three additional variables.

        1. For each member of the household, the years of schooling for the parents;
        2. For every mother in the household, a dummy variable for whether the youngest child is immunized.

        Could you please tell me how I can do that? I would really appreciate your help!

        Comment


        • #5
          Code:
          clear*
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte(hhid pid mid rel_head age schooling immunized)
          10 1 3 1 56 16 1
          10 2 . 2 48 16 0
          10 3 . 7 75 8  1
          10 4 . 8 80 14 1
          10 5 2 6 18 12 1
          10 6 2 6 16 10 1
          10 7 3 5 52 15 0
          10 8 3 5 49 .  .
          12 1 . 1 25 12 1
          12 2 . 2 24 12 1
          end
          
          // GET MOTHER'S EDUCDATION
          frame put hhid pid mid schooling, into(education)
          frlink m:1 hhid mid, frame(education hhid pid)
          frget m_schooling = schooling, from(education)
          frame drop education
          drop education
          
          // FOR EACH MOTHER GET YOUNGEST CHILD'S IMMUNIZATION STATUS
          preserve
          rename (pid-immunized) =_kid
          keep if !missing(mid)
          gen link = mid
          tempfile kids
          save `kids'
          restore
          gen link = pid
          joinby hhid link using `kids', unmatched(master)
          by hhid pid (age_kid), sort: keep if _n == 1 // YOUNGEST CHILD
          gen byte youngest_chlid_immunized = immunized_kid
          drop *_kid _merge link
          Note: I cannot see how to get father's education variable because nothing in the data tells me who a person's father is. If in the real data there is such a variable, perhaps called fid, then the code is exactly like that for mother's education, just replacing mid by fid throughout.
          Last edited by Clyde Schechter; 07 Oct 2021, 12:10.

          Comment


          • #6
            Thanks, Clyde! Yes, there is a father's id which I missed.

            Comment


            • #7
              Hi Clyde:

              I have am dealing with a new task similar to the old one. I have a dataset in stata. This dataset contains individual level data. Each row corresponds to one individual entry.
              Variable include household id (hhold), unique individual id within a household (id), individual sex (sex),
              age (age) and a variable called relationship with head of the family (relationship). Relationship is a categorical variable taking on values between 1 and 14.
              Here are the labels for this variable:
              1 HEAD
              2 HUSBAND / WIFE
              3 SON / DUAGHTER
              4 SPOUSE OF SON / DUAGHTER
              5 GRANCHILD
              6 FATHER / MOTHER
              7 BROTHER / SISTER
              8 NIECE / NEPHEW
              9 FATHER-IN-LAW / MOTHER-IN-LAW
              10 BROTHER-IN-LAW / SISTER-IN-LAW
              11 OTHER RELATIVE (SPECIFY)
              12 SERVANT
              13 EMPLOYEE
              14 OTHER (SPECIFY)
              If it takes the value of 1, the person is the head. If it takes the value of 7, they are the sibling of the head and so on.

              Two individuals within a household are siblings if:
              1. someone has relationship==1 (he/she is the sibling of someone whose value for relationship is 7) and someone else has relationship==7 (he/she is the sibling of someone whose value for relationship is 1);
              2. someone has relationship==2 (he/she is the sibling of someone whose value for relationship is 10) and someone else has relationship==10 (he/she is the sibling of someone whose value for relationship is 2);
              3. someone has relationship==3 (he/she is the sibling of someone whose value for relationship is 3) and someone else has relationship==3 (he/she is the sibling of someone whose value for relationship is 3);
              4. someone has relationship==5 (he/she is the sibling of someone whose value for relationship is 5) and someone else has relationship==5 (he/she is the sibling of someone whose value for relationship is 5);
              5. someone has relationship==8 (he/she is the sibling of someone whose value for relationship is 8) and someone else has relationship==8 (he/she is the sibling of someone whose value for relationship is 8);


              I want to modify the dataset: create a dataset where each observation is a single sibling-pair: for instance if a household has 2 brothers and 2 sisters,
              the first entry should be the first brother first sister. The second row is for the first brother and second sister. The third row should be the second brother
              and first sister. The fourth row should be the second brother and the second sister. Each row should include all the brother information and all the sister information for that pair. Is there any quick way to do this?

              Comment


              • #8
                #7 See also https://www.statalist.org/forums/for...sing-unique-id

                Comment


                • #9
                  Hi Nick, thanks for the reply!

                  I actually have a dataset that looks like this:

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input float hhold byte(id sex relationship) int age
                  1001 1 1 1 46
                  1001 2 2 2 38
                  1001 3 1 3 12
                  1001 4 2 3 18
                  1003 1 1 1 46
                  1003 2 2 2 40
                  1003 3 2 3 11
                  1003 4 2 3 19
                  1004 1 1 1 42
                  1004 2 2 2 36
                  1004 3 1 3 13
                  1004 4 1 3 17
                  1011 1 1 1 62
                  1011 2 2 2 55
                  1011 3 1 3 30
                  1011 4 1 3 35
                  1011 5 2 4 26
                  1011 6 1 5  8
                  label values sex S1AQ01
                  label def S1AQ01 1 "MALE", modify
                  label def S1AQ01 2 "FEMALE", modify
                  label values relationship S1AQ02
                  label def S1AQ02 1 "HEAD", modify
                  label def S1AQ02 2 "HUSBAND / WIFE", modify
                  label def S1AQ02 3 "SON / DUAGHTER", modify
                  label def S1AQ02 4 "SPOUSE OF SON / DUAGHTER", modify
                  label def S1AQ02 5 "GRANCHILD", modify
                  label def S1AQ02 6 "FATHER / MOTHER", modify
                  label def S1AQ02 7 "BROTHER / SISTER", modify
                  Essentially, I am trying to create a dataset where each row would be a sibling-pair unique to one sibling only. I am trying to create a sibling-pair based off of the variable relationship with head mentioned above. So for instance, for household number 1001 above, I would have two rows of observations (the third and fourth observations would be one sibling pair because they have a relationship value of 3). Within each row, I want the variables hhold, id, relationship, age, sex of the individual, and the corresponding variables for the sibling. It would look something like this:

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input float hhold byte(id sex relationship) int age hhold_sibling id_sibling sex_sibling relationship_sibling age_sibling
                  1001 3 1 3 12 1001 4 2 3 18
                  1001 4 2 3 18 1001 3 1 3 12
                  end
                  There are other possible combinations of sibling-pairs that can be generated based on the relationship values (for instance 1 and 7 would be one), and more than one sibling pair for same relationship values (for instance 3 individuals with the relationship value of 3). I just do not know how to do this. Any help would be much appreciated!

                  Comment


                  • #10
                    This will do what you ask:
                    Code:
                    tempfile copy
                    save `copy'
                    
                    //    MATCH EACH PERSON TO ALL OTHER PERSONS IN HOUSEHOLD
                    //    DUE TO SYMMETRY OF RELATIONSHIPS, NO NEED TO REPEAT A:B AND B:A PAIRS
                    //    NOR DO WE WANT SELF-PAIRING
                    rangejoin id 1 . using `copy', by(hhold)
                    
                    //    NOW RETAIN ONLY THE "SIBLINGS"
                    keep if relationship == 1 & relationship_U == 7 ///
                            | relationship == 2 & relationship_U == 10 ///
                            | relationship == 3 & relationship_U == 3 ///
                            | relationship == 5 & relationship_U == 5 ///
                            | relationship == 8 & relationship_U == 8
                    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also instasll -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer,, also available from SSC.

                    I have put siblings in scarequotes in my comment because some of these pairings may or may not really be siblings. To take the most obvious example two people with relationship = 5 are both grandchildren of the household head; they are at least as likely to be cousins as siblings. In fact, of all the combinations you show, only 1:7 and 3:3 are sure to be siblings. I do not see any way to further restrict the other combinations using this data to guarantee sibship between them. The best one can really say here is that any pair of people in the same household who do not meet any of these relationship pairings are definitely not siblings.

                    Comment


                    • #11
                      Thanks, Clyde! It worked, and you are right about the combinations.

                      One additional thing: I actually want two observations for each sibling-pair, i.e. distinguish between A:B and B:A pairs? How would I modify the above code?

                      Comment


                      • #12
                        To get the pairs in both directions:
                        Code:
                        preserve
                        ds hhold, not
                        rename (`r(varlist)') =_U
                        tempfile copy
                        save `copy'
                        restore
                        
                        //    MATCH EACH PERSON TO ALL OTHER PERSONS IN HOUSEHOLD
                        joinby hhold using `copy'
                        drop if id == id_U
                        
                        //    NOW RETAIN ONLY THE "SIBLINGS"
                        keep if relationship == 1 & relationship_U == 7 ///
                                | relationship == 2 & relationship_U == 10 ///
                                | relationship == 3 & relationship_U == 3 ///
                                | relationship == 5 & relationship_U == 5 ///
                                | relationship == 8 & relationship_U == 8

                        Comment


                        • #13
                          It worked, Clyde! Can't thank you enough! God Bless!

                          Comment


                          • #14
                            Originally posted by Devon Smith View Post
                            Hi Nick, thanks for the reply!

                            I actually have a dataset that looks like this:

                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input float hhold byte(id sex relationship) int age
                            1001 1 1 1 46
                            1001 2 2 2 38
                            1001 3 1 3 12
                            1001 4 2 3 18
                            1003 1 1 1 46
                            1003 2 2 2 40
                            1003 3 2 3 11
                            1003 4 2 3 19
                            1004 1 1 1 42
                            1004 2 2 2 36
                            1004 3 1 3 13
                            1004 4 1 3 17
                            1011 1 1 1 62
                            1011 2 2 2 55
                            1011 3 1 3 30
                            1011 4 1 3 35
                            1011 5 2 4 26
                            1011 6 1 5 8
                            label values sex S1AQ01
                            label def S1AQ01 1 "MALE", modify
                            label def S1AQ01 2 "FEMALE", modify
                            label values relationship S1AQ02
                            label def S1AQ02 1 "HEAD", modify
                            label def S1AQ02 2 "HUSBAND / WIFE", modify
                            label def S1AQ02 3 "SON / DUAGHTER", modify
                            label def S1AQ02 4 "SPOUSE OF SON / DUAGHTER", modify
                            label def S1AQ02 5 "GRANCHILD", modify
                            label def S1AQ02 6 "FATHER / MOTHER", modify
                            label def S1AQ02 7 "BROTHER / SISTER", modify
                            Essentially, I am trying to create a dataset where each row would be a sibling-pair unique to one sibling only. I am trying to create a sibling-pair based off of the variable relationship with head mentioned above. So for instance, for household number 1001 above, I would have two rows of observations (the third and fourth observations would be one sibling pair because they have a relationship value of 3). Within each row, I want the variables hhold, id, relationship, age, sex of the individual, and the corresponding variables for the sibling. It would look something like this:

                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input float hhold byte(id sex relationship) int age hhold_sibling id_sibling sex_sibling relationship_sibling age_sibling
                            1001 3 1 3 12 1001 4 2 3 18
                            1001 4 2 3 18 1001 3 1 3 12
                            end
                            There are other possible combinations of sibling-pairs that can be generated based on the relationship values (for instance 1 and 7 would be one), and more than one sibling pair for same relationship values (for instance 3 individuals with the relationship value of 3). I just do not know how to do this. Any help would be much appreciated!
                            Hi Clyde:

                            Using this dataset, I want to do the following: for each children in the household I want to create a new variable `father_id' using the `relationship' and 'id' variable. Any help would be greatly appreciated!

                            Comment


                            • #15
                              The identification of fathers can only be partially accomplished with the information available. For example, a grandchild's father would presumably be one of the people listed as a son, but there could be several such and there is no way to identify which is the one. The unambiguous cases are: those with relationship son or daughter unequivocally have the head as father if the head is male, and the head, regardless of sex, unequivocally has the person identified as father/mother and male as father. Additionally, the brother or sister of the head necessarily has the same father as the head. Code implementing this looks like:
                              Code:
                              gen father_id = .
                              by hhold (id), sort: egen temp = max(cond(relationship == 1 & sex == 1, ///
                                  id, .))
                              replace father_id = temp if relationship == 3
                              drop temp
                              
                              by hhold (id): egen temp = max(cond(relationship == 6 & sex == 1, id, .))
                              replace father_id = temp if inlist(relationship, 1, 7)
                              drop temp

                              Comment

                              Working...
                              X