
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Household panel data : how to define a binary variable using information on another member of the household

    I have a panel data of individuals over 4 years. Each person has an id (avs_nbr), a gender variable, a date of birth, a year of observation and a household id to know who lives with who. Each person also has a variable that refers to their relatives. For ex., an individual has a variable father_id that takes as value the avs_nbr of their father. Same for mother, partners, children, etc. However, these are not 100% reliable as some children can have a missing value for parents, some married couple a missing value for partners, etc.

    I am interested in births happening between october 1st 2020 and mars 31st 2021. I can easily detect if a child is born within this window like this :
    gen born_in_range = inrange(birthday,date("1oct2020", "DMY"),date("31mar2021","DMY"))
    hashsort householdid
    by householdid : egen hh_born_in_range = max(born_in_range)
    keep if hh_born_in_range == 1
    hashsort householdid avs_nbr year
    gen birthyear = year(birthday)
    replace born_in_range =0 if year!=birthyear /// this corrects the variable born_in_range such that it doesn't = 1 for a child over the whole obs period but only = 1 on the year of birth
    Here I flag all children born in the desired window, then flag the household they belong to, and drop all households not witnessing births within that window.

    What I would like to do now is to create a flag for mothers and fathers that takes value == 1 if that person is the parent of the child born in the 6-months window.

    I tried doing it so
    *Generate flag for children
    gen is_child =(year-birthyear<17)
    *Need to identify fathers
    hashsort householdid year
    by householdid year : egen witnessed_birth = max(born_in_range) /// want to create a variable at the household-year level == 1 if there was a birth in that year.
    hashsort householdid avs_nbr year
    gen is_father = 0
    replace is_father = 1 if witnessed_birth == 1 & sex == 1 & is_child == 0
    tab is_father,m
    My sample has 41'405 births, and this method detects 49'709 fathers. I need to fine tune it more. I do not know if it is the correct approach. If not, what is ? If yes, how do I improve its accuracy ?

    I cannot give a code snippet as the provider wishes the data to remain confidential.

    Thank you very much !

  • #2
    I cannot give a code snippet as the provider wishes the data to remain confidential.
    We naturally respect this kind of constraint but address it directly in our FAQ Advice (12.2).

    If your dataset is confidential, then provide a fake example instead.

