Find number of children, age at birth for women using identifiers

Devon Smith

Join Date: Sep 2021

Posts: 26
#1

Find number of children, age at birth for women using identifiers

28 Sep 2021, 21:31

Hi:

I am working with a dataset that looks something like this:

HHID PID MID Rel_Head Age
10 1 3 1 56
10 2 . 2 48
10 3 . 7 75
10 4 . 8 80
10 5 2 6 18
10 6 2 6 16
10 7 3 5 52
10 8 3 5 49
12 1 . 1 25
12 2 . 2 24

where HHID is household identifier; PID is member identifier within each household; MID is the identifier for the mother; Rel_Head is the relationship to the head of the household (it's 1 if individual is head; 2 if they are spouse of head; 6 if child of head; 8 if father of head and so on).

For instance, in the above dataset, PID-5 and 6 are children of 1 and 2 in household 10.

I want to construct two variables from this dataset:

1. The number of children for each mother in the household;
2. The age at first birth for each mother. In other words, difference between her age and her oldest offspring's age.

Basically, I want to have two variables corresponding to the last two columns below:

HHID PID MID Rel_Head Age N_children Age_f_birth
10 1 3 1 56
10 2 . 2 48 2 30
10 3 . 7 75 2 19
10 4 . 8 80
10 5 2 6 18
10 6 2 6 16
10 7 3 5 52
10 8 3 5 49
12 1 . 1 25
12 2 . 2 24

Any help would be immensely appreciated! Thank you!

Last edited by Devon Smith; 28 Sep 2021, 21:39.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

28 Sep 2021, 22:25

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(hhid pid mid rel_head age) 10 1 3 1 56 10 2 . 2 48 10 3 . 7 75 10 4 . 8 80 10 5 2 6 18 10 6 2 6 16 10 7 3 5 52 10 8 3 5 49 12 1 . 1 25 12 2 . 2 24 end preserve rename (pid-age) =_U keep if !missing(mid) tempfile kids save `kids' restore joinby hhid using `kids', unmatched(master) by hhid pid, sort: egen n_children = total(mid_U == pid) by hhid pid: egen age_first_birth = min(cond(mid_U == pid, age-age_U, .)) by hhid pid: keep if _n == 1 drop *_U

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#3

29 Sep 2021, 19:40

Thank you so much! I will use dataex in the future.
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#4

07 Oct 2021, 11:42

Hi Clyde:

I had a follow-up question. I am using the same dataset that I mentioned above but now I have two additional variables: years of schooling and a dummy variable for whether the household member is immunized.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(hhid pid mid rel_head age schooling immunized) 10 1 3 1 56 16 1 10 2 . 2 48 16 0 10 3 . 7 75 8 1 10 4 . 8 80 14 1 10 5 2 6 18 12 1 10 6 2 6 16 10 1 10 7 3 5 52 15 0 10 8 3 5 49 . . 12 1 . 1 25 12 1 12 2 . 2 24 12 1 end

Now I want to generate three additional variables.

1. For each member of the household, the years of schooling for the parents;
2. For every mother in the household, a dummy variable for whether the youngest child is immunized.

Could you please tell me how I can do that? I would really appreciate your help!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

07 Oct 2021, 12:08

Code:

clear*
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(hhid pid mid rel_head age schooling immunized)
10 1 3 1 56 16 1
10 2 . 2 48 16 0
10 3 . 7 75 8  1
10 4 . 8 80 14 1
10 5 2 6 18 12 1
10 6 2 6 16 10 1
10 7 3 5 52 15 0
10 8 3 5 49 .  .
12 1 . 1 25 12 1
12 2 . 2 24 12 1
end

// GET MOTHER'S EDUCDATION
frame put hhid pid mid schooling, into(education)
frlink m:1 hhid mid, frame(education hhid pid)
frget m_schooling = schooling, from(education)
frame drop education
drop education

// FOR EACH MOTHER GET YOUNGEST CHILD'S IMMUNIZATION STATUS
preserve
rename (pid-immunized) =_kid
keep if !missing(mid)
gen link = mid
tempfile kids
save `kids'
restore
gen link = pid
joinby hhid link using `kids', unmatched(master)
by hhid pid (age_kid), sort: keep if _n == 1 // YOUNGEST CHILD
gen byte youngest_chlid_immunized = immunized_kid
drop *_kid _merge link

Note: I cannot see how to get father's education variable because nothing in the data tells me who a person's father is. If in the real data there is such a variable, perhaps called fid, then the code is exactly like that for mother's education, just replacing mid by fid throughout.

Last edited by Clyde Schechter; 07 Oct 2021, 12:10.

Comment

Devon Smith

Join Date: Sep 2021

Posts: 26
#6

07 Oct 2021, 12:11

Thanks, Clyde! Yes, there is a father's id which I missed.
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#7

18 Feb 2024, 04:15

Hi Clyde:

I have am dealing with a new task similar to the old one. I have a dataset in stata. This dataset contains individual level data. Each row corresponds to one individual entry.
Variable include household id (hhold), unique individual id within a household (id), individual sex (sex),
age (age) and a variable called relationship with head of the family (relationship). Relationship is a categorical variable taking on values between 1 and 14.
Here are the labels for this variable:
1 HEAD
2 HUSBAND / WIFE
3 SON / DUAGHTER
4 SPOUSE OF SON / DUAGHTER
5 GRANCHILD
6 FATHER / MOTHER
7 BROTHER / SISTER
8 NIECE / NEPHEW
9 FATHER-IN-LAW / MOTHER-IN-LAW
10 BROTHER-IN-LAW / SISTER-IN-LAW
11 OTHER RELATIVE (SPECIFY)
12 SERVANT
13 EMPLOYEE
14 OTHER (SPECIFY)
If it takes the value of 1, the person is the head. If it takes the value of 7, they are the sibling of the head and so on.

Two individuals within a household are siblings if:
1. someone has relationship==1 (he/she is the sibling of someone whose value for relationship is 7) and someone else has relationship==7 (he/she is the sibling of someone whose value for relationship is 1);
2. someone has relationship==2 (he/she is the sibling of someone whose value for relationship is 10) and someone else has relationship==10 (he/she is the sibling of someone whose value for relationship is 2);
3. someone has relationship==3 (he/she is the sibling of someone whose value for relationship is 3) and someone else has relationship==3 (he/she is the sibling of someone whose value for relationship is 3);
4. someone has relationship==5 (he/she is the sibling of someone whose value for relationship is 5) and someone else has relationship==5 (he/she is the sibling of someone whose value for relationship is 5);
5. someone has relationship==8 (he/she is the sibling of someone whose value for relationship is 8) and someone else has relationship==8 (he/she is the sibling of someone whose value for relationship is 8);

I want to modify the dataset: create a dataset where each observation is a single sibling-pair: for instance if a household has 2 brothers and 2 sisters,
the first entry should be the first brother first sister. The second row is for the first brother and second sister. The third row should be the second brother
and first sister. The fourth row should be the second brother and the second sister. Each row should include all the brother information and all the sister information for that pair. Is there any quick way to do this?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#8

18 Feb 2024, 05:21

#7 See also https://www.statalist.org/forums/for...sing-unique-id
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#9

18 Feb 2024, 19:58

Hi Nick, thanks for the reply!

I actually have a dataset that looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float hhold byte(id sex relationship) int age 1001 1 1 1 46 1001 2 2 2 38 1001 3 1 3 12 1001 4 2 3 18 1003 1 1 1 46 1003 2 2 2 40 1003 3 2 3 11 1003 4 2 3 19 1004 1 1 1 42 1004 2 2 2 36 1004 3 1 3 13 1004 4 1 3 17 1011 1 1 1 62 1011 2 2 2 55 1011 3 1 3 30 1011 4 1 3 35 1011 5 2 4 26 1011 6 1 5 8 label values sex S1AQ01 label def S1AQ01 1 "MALE", modify label def S1AQ01 2 "FEMALE", modify label values relationship S1AQ02 label def S1AQ02 1 "HEAD", modify label def S1AQ02 2 "HUSBAND / WIFE", modify label def S1AQ02 3 "SON / DUAGHTER", modify label def S1AQ02 4 "SPOUSE OF SON / DUAGHTER", modify label def S1AQ02 5 "GRANCHILD", modify label def S1AQ02 6 "FATHER / MOTHER", modify label def S1AQ02 7 "BROTHER / SISTER", modify

Essentially, I am trying to create a dataset where each row would be a sibling-pair unique to one sibling only. I am trying to create a sibling-pair based off of the variable relationship with head mentioned above. So for instance, for household number 1001 above, I would have two rows of observations (the third and fourth observations would be one sibling pair because they have a relationship value of 3). Within each row, I want the variables hhold, id, relationship, age, sex of the individual, and the corresponding variables for the sibling. It would look something like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float hhold byte(id sex relationship) int age hhold_sibling id_sibling sex_sibling relationship_sibling age_sibling 1001 3 1 3 12 1001 4 2 3 18 1001 4 2 3 18 1001 3 1 3 12 end

There are other possible combinations of sibling-pairs that can be generated based on the relationship values (for instance 1 and 7 would be one), and more than one sibling pair for same relationship values (for instance 3 individuals with the relationship value of 3). I just do not know how to do this. Any help would be much appreciated!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#10

18 Feb 2024, 20:44

This will do what you ask:

Code:

tempfile copy save `copy' // MATCH EACH PERSON TO ALL OTHER PERSONS IN HOUSEHOLD // DUE TO SYMMETRY OF RELATIONSHIPS, NO NEED TO REPEAT A:B AND B:A PAIRS // NOR DO WE WANT SELF-PAIRING rangejoin id 1 . using `copy', by(hhold) // NOW RETAIN ONLY THE "SIBLINGS" keep if relationship == 1 & relationship_U == 7 /// | relationship == 2 & relationship_U == 10 /// | relationship == 3 & relationship_U == 3 /// | relationship == 5 & relationship_U == 5 /// | relationship == 8 & relationship_U == 8

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also instasll -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer,, also available from SSC.

I have put siblings in scarequotes in my comment because some of these pairings may or may not really be siblings. To take the most obvious example two people with relationship = 5 are both grandchildren of the household head; they are at least as likely to be cousins as siblings. In fact, of all the combinations you show, only 1:7 and 3:3 are sure to be siblings. I do not see any way to further restrict the other combinations using this data to guarantee sibship between them. The best one can really say here is that any pair of people in the same household who do not meet any of these relationship pairings are definitely not siblings.
1 like
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#11

18 Feb 2024, 21:15

Thanks, Clyde! It worked, and you are right about the combinations.

One additional thing: I actually want two observations for each sibling-pair, i.e. distinguish between A:B and B:A pairs? How would I modify the above code?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

#12

18 Feb 2024, 23:28

To get the pairs in both directions:

Code:

preserve
ds hhold, not
rename (`r(varlist)') =_U
tempfile copy
save `copy'
restore

//    MATCH EACH PERSON TO ALL OTHER PERSONS IN HOUSEHOLD
joinby hhold using `copy'
drop if id == id_U

//    NOW RETAIN ONLY THE "SIBLINGS"
keep if relationship == 1 & relationship_U == 7 ///
        | relationship == 2 & relationship_U == 10 ///
        | relationship == 3 & relationship_U == 3 ///
        | relationship == 5 & relationship_U == 5 ///
        | relationship == 8 & relationship_U == 8

Comment

Devon Smith

Join Date: Sep 2021

Posts: 26
#13

19 Feb 2024, 02:56

It worked, Clyde! Can't thank you enough! God Bless!
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#14

20 Apr 2024, 21:10

Originally posted by Devon Smith View Post

Hi Nick, thanks for the reply!

I actually have a dataset that looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float hhold byte(id sex relationship) int age 1001 1 1 1 46 1001 2 2 2 38 1001 3 1 3 12 1001 4 2 3 18 1003 1 1 1 46 1003 2 2 2 40 1003 3 2 3 11 1003 4 2 3 19 1004 1 1 1 42 1004 2 2 2 36 1004 3 1 3 13 1004 4 1 3 17 1011 1 1 1 62 1011 2 2 2 55 1011 3 1 3 30 1011 4 1 3 35 1011 5 2 4 26 1011 6 1 5 8 label values sex S1AQ01 label def S1AQ01 1 "MALE", modify label def S1AQ01 2 "FEMALE", modify label values relationship S1AQ02 label def S1AQ02 1 "HEAD", modify label def S1AQ02 2 "HUSBAND / WIFE", modify label def S1AQ02 3 "SON / DUAGHTER", modify label def S1AQ02 4 "SPOUSE OF SON / DUAGHTER", modify label def S1AQ02 5 "GRANCHILD", modify label def S1AQ02 6 "FATHER / MOTHER", modify label def S1AQ02 7 "BROTHER / SISTER", modify

Essentially, I am trying to create a dataset where each row would be a sibling-pair unique to one sibling only. I am trying to create a sibling-pair based off of the variable relationship with head mentioned above. So for instance, for household number 1001 above, I would have two rows of observations (the third and fourth observations would be one sibling pair because they have a relationship value of 3). Within each row, I want the variables hhold, id, relationship, age, sex of the individual, and the corresponding variables for the sibling. It would look something like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float hhold byte(id sex relationship) int age hhold_sibling id_sibling sex_sibling relationship_sibling age_sibling 1001 3 1 3 12 1001 4 2 3 18 1001 4 2 3 18 1001 3 1 3 12 end

There are other possible combinations of sibling-pairs that can be generated based on the relationship values (for instance 1 and 7 would be one), and more than one sibling pair for same relationship values (for instance 3 individuals with the relationship value of 3). I just do not know how to do this. Any help would be much appreciated!

Hi Clyde:

Using this dataset, I want to do the following: for each children in the household I want to create a new variable `father_id' using the `relationship' and 'id' variable. Any help would be greatly appreciated!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#15

21 Apr 2024, 11:27

The identification of fathers can only be partially accomplished with the information available. For example, a grandchild's father would presumably be one of the people listed as a son, but there could be several such and there is no way to identify which is the one. The unambiguous cases are: those with relationship son or daughter unequivocally have the head as father if the head is male, and the head, regardless of sex, unequivocally has the person identified as father/mother and male as father. Additionally, the brother or sister of the head necessarily has the same father as the head. Code implementing this looks like:

Code:

gen father_id = . by hhold (id), sort: egen temp = max(cond(relationship == 1 & sex == 1, /// id, .)) replace father_id = temp if relationship == 3 drop temp by hhold (id): egen temp = max(cond(relationship == 6 & sex == 1, id, .)) replace father_id = temp if inlist(relationship, 1, 7) drop temp
1 like
Comment

Announcement

Find number of children, age at birth for women using identifiers

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment