Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable based on another member of the household in panel data?

    Hi everyone, I have household panel data, where I have information on everyone living in the household. As such, my data looks somewhat like this:

    Code:
    clear 
    input hhid id year age head
    100 1 1990 25 1
    100 1 1991 26 1
    100 1 1992 27 1
    100 1 1993 28 1
    100 1 1994 29 1
    
    100 2 1991 0 0
    100 2 1992 1 0
    100 2 1993 2 0
    100 2 1994 3 0
    
    200 3 1981 27 1
    200 3 1982 28 1
    200 3 1983 29 1
    200 3 1984 30 1
    200 3 1985 31 1
    
    200 4 1983 0 0
    200 4 1984 1 0
    200 4 1985 2 0
    200 4 1986 3 0
    200 4 1987 4 0
    
    200 5 1985 0 0
    200 5 1986 1 0
    200 5 1987 2 0
    200 5 1988 3 0
    200 5 1989 4 0
    
    end
    Here hhid is the household ID; id is the individual identifier; and head is a binary variable indicating whether someone is the head of household (head=1) or not (head=0). In the example above, I present five individuals in two separate households (100 vs 200). In each household, you can see an adult and at least one child. The children enter the panel when they are born (age = 0).

    For each child, I would like identify the age of the household head when the child was born. Then, I would like make that variable constant across all waves. Thus, I would like my data to look as follows:

    Code:
    clear 
    input hhid id year age head hage
    100 1 1990 25 1 .
    100 1 1991 26 1 .
    100 1 1992 27 1 .
    100 1 1993 28 1 .
    100 1 1994 29 1 .
    
    100 2 1991 0 0 25
    100 2 1992 1 0 25
    100 2 1993 2 0 25
    100 2 1994 3 0 25
    
    200 3 1981 27 1 .
    200 3 1982 28 1 .
    200 3 1983 29 1 .
    200 3 1984 30 1 .
    200 3 1985 31 1 .
    
    200 4 1983 0 0 29
    200 4 1984 1 0 29
    200 4 1985 2 0 29
    200 4 1986 3 0 29
    200 4 1987 4 0 29
    
    200 5 1985 0 0 31
    200 5 1986 1 0 31
    200 5 1987 2 0 31
    200 5 1988 3 0 31
    200 5 1989 4 0 31
    
    end


    I am quite stumped on how to go about doing this. If anyone knows, I would greatly appreciate the help!

  • #2
    I suspect there are other more clever ways to do this, but here's a relatively straightforward approach. I think this does what you want.
    Code:
    clear
    input hhid id year age head
    100 1 1990 25 1
    100 1 1991 26 1
    100 1 1992 27 1
    100 1 1993 28 1
    100 1 1994 29 1
    100 2 1991 0 0
    100 2 1992 1 0
    100 2 1993 2 0
    100 2 1994 3 0
    200 3 1981 27 1
    200 3 1982 28 1
    200 3 1983 29 1
    200 3 1984 30 1
    200 3 1985 31 1
    200 4 1983 0 0
    200 4 1984 1 0
    200 4 1985 2 0
    200 4 1986 3 0
    200 4 1987 4 0
    200 5 1985 0 0
    200 5 1986 1 0
    200 5 1987 2 0
    200 5 1988 3 0
    200 5 1989 4 0
    end
    // Make a file of info on just the household heads
    preserve
    keep if head
    keep hhid year age
    rename age headage
    tempfile headfile
    save `headfile'
    restore
    // Get the head's age onto each year's record
    merge m:1 hhid year using `headfile', keepusing(headage)
    // Set head age on irrelevant records to missing.
    // First, I assume that a current head is never to be treated as a child.
    replace headage = . if head == 1  
    // Second, we only want head ages for birth year observations
    replace headage = . if (age != 0)
    // Spread headage to all waves for each id, relying on the fact that
    // nonmissing observations of headage will sort to the first record
    // of each id.
    bysort id: replace headage = headage[1]
    list

    Comment


    • #3
      I think there is an error in the penultimate line of code in #2. Contrary to the comment that precedes it, the non-missing values of headage will not necessarily sort to the first record of each id. And, at least on my second run of this code, they did not, and the result was missing values for headage in every observation. To get headage to spread properly to all observations for the id, you also have to sort on headage. Also, the same id can occur in different households, so sorting on id can mingle the data from two unrelated persons. I think the correct code would be:
      Code:
      bysort hhid id (headage): replace headage = headage[1]
      I also remark that I believe O.P. has made an error in his listing of the desired results. id 2 in household 100 is born in 1991, and in that year, the head of household 100, id 1, is 26 years old, not 25.

      Finally, while I think the approach in #2 is fine, if your data set is very large the following code might prove a bit faster:
      Code:
      gen match_year = year if age == 0
      gen match_age = age if head
      rangestat (mean) headage = match_age, by(hhid) interval(year match_year match_year)
      replace headage = . if missing(match_year)
      bysort hhid id (headage): replace headage = headage[1]
      drop match_*
      -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

      Comment


      • #4
        Hi Mike and Clyde, thank you very much for your help! The code works as I desired. Clyde, you are correct on all three remarks. Thank you!

        Comment

        Working...
        X