Creating a variable based on another member of the household in panel data?

Jason Smith

Join Date: Nov 2023

Posts: 15
#1

Creating a variable based on another member of the household in panel data?

03 Jan 2025, 15:35

Hi everyone, I have household panel data, where I have information on everyone living in the household. As such, my data looks somewhat like this:

Code:

clear input hhid id year age head 100 1 1990 25 1 100 1 1991 26 1 100 1 1992 27 1 100 1 1993 28 1 100 1 1994 29 1 100 2 1991 0 0 100 2 1992 1 0 100 2 1993 2 0 100 2 1994 3 0 200 3 1981 27 1 200 3 1982 28 1 200 3 1983 29 1 200 3 1984 30 1 200 3 1985 31 1 200 4 1983 0 0 200 4 1984 1 0 200 4 1985 2 0 200 4 1986 3 0 200 4 1987 4 0 200 5 1985 0 0 200 5 1986 1 0 200 5 1987 2 0 200 5 1988 3 0 200 5 1989 4 0 end

Here hhid is the household ID; id is the individual identifier; and head is a binary variable indicating whether someone is the head of household (head=1) or not (head=0). In the example above, I present five individuals in two separate households (100 vs 200). In each household, you can see an adult and at least one child. The children enter the panel when they are born (age = 0).

For each child, I would like identify the age of the household head when the child was born. Then, I would like make that variable constant across all waves. Thus, I would like my data to look as follows:

Code:

clear input hhid id year age head hage 100 1 1990 25 1 . 100 1 1991 26 1 . 100 1 1992 27 1 . 100 1 1993 28 1 . 100 1 1994 29 1 . 100 2 1991 0 0 25 100 2 1992 1 0 25 100 2 1993 2 0 25 100 2 1994 3 0 25 200 3 1981 27 1 . 200 3 1982 28 1 . 200 3 1983 29 1 . 200 3 1984 30 1 . 200 3 1985 31 1 . 200 4 1983 0 0 29 200 4 1984 1 0 29 200 4 1985 2 0 29 200 4 1986 3 0 29 200 4 1987 4 0 29 200 5 1985 0 0 31 200 5 1986 1 0 31 200 5 1987 2 0 31 200 5 1988 3 0 31 200 5 1989 4 0 31 end

I am quite stumped on how to go about doing this. If anyone knows, I would greatly appreciate the help!
Tags: household data, panel data, siblings

Mike Lacy

Join Date: Apr 2014
Posts: 2391

03 Jan 2025, 16:54

I suspect there are other more clever ways to do this, but here's a relatively straightforward approach. I think this does what you want.

Code:

clear
input hhid id year age head
100 1 1990 25 1
100 1 1991 26 1
100 1 1992 27 1
100 1 1993 28 1
100 1 1994 29 1
100 2 1991 0 0
100 2 1992 1 0
100 2 1993 2 0
100 2 1994 3 0
200 3 1981 27 1
200 3 1982 28 1
200 3 1983 29 1
200 3 1984 30 1
200 3 1985 31 1
200 4 1983 0 0
200 4 1984 1 0
200 4 1985 2 0
200 4 1986 3 0
200 4 1987 4 0
200 5 1985 0 0
200 5 1986 1 0
200 5 1987 2 0
200 5 1988 3 0
200 5 1989 4 0
end
// Make a file of info on just the household heads
preserve
keep if head
keep hhid year age
rename age headage
tempfile headfile
save `headfile'
restore
// Get the head's age onto each year's record
merge m:1 hhid year using `headfile', keepusing(headage)
// Set head age on irrelevant records to missing.
// First, I assume that a current head is never to be treated as a child.
replace headage = . if head == 1  
// Second, we only want head ages for birth year observations
replace headage = . if (age != 0)
// Spread headage to all waves for each id, relying on the fact that
// nonmissing observations of headage will sort to the first record
// of each id.
bysort id: replace headage = headage[1]
list

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29691
#3

03 Jan 2025, 22:30

I think there is an error in the penultimate line of code in #2. Contrary to the comment that precedes it, the non-missing values of headage will not necessarily sort to the first record of each id. And, at least on my second run of this code, they did not, and the result was missing values for headage in every observation. To get headage to spread properly to all observations for the id, you also have to sort on headage. Also, the same id can occur in different households, so sorting on id can mingle the data from two unrelated persons. I think the correct code would be:

Code:

bysort hhid id (headage): replace headage = headage[1]

I also remark that I believe O.P. has made an error in his listing of the desired results. id 2 in household 100 is born in 1991, and in that year, the head of household 100, id 1, is 26 years old, not 25.

Finally, while I think the approach in #2 is fine, if your data set is very large the following code might prove a bit faster:

Code:

gen match_year = year if age == 0 gen match_age = age if head rangestat (mean) headage = match_age, by(hhid) interval(year match_year match_year) replace headage = . if missing(match_year) bysort hhid id (headage): replace headage = headage[1] drop match_*

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
1 like
Comment
Jason Smith

Join Date: Nov 2023

Posts: 15
#4

05 Jan 2025, 12:49

Hi Mike and Clyde, thank you very much for your help! The code works as I desired. Clyde, you are correct on all three remarks. Thank you!
Comment

Announcement

Creating a variable based on another member of the household in panel data?

Comment

Comment

Comment