I have a yearly panel dataset of families, covering years 2019-2022. Each person has an individual id, and individuals living in a household together are linked by another variable household id. I would like to keep all households that have witnessed the birth of a child between october 1, 2020 and march 31, 2021.
Here is the code I am using:
When I browse the data, it seems like I am getting rid of parents as well, as a lot of household ids only appear for the child born within that timeframe. I have 104'000 observations with born_in_range ==1 and 117'000 observations with hh_born_in_range == 1, I should have multiple times as many obs with hh_born_in_range comapred to born_in_range. In addition, I do not have any observations from the year 2019 left.
I unfortunately cannot share data as the provider wishes it to remain protected.
Here is the code I am using:
Code:
gen born_in_range = inrange(birthday, date("01oct2020", "DMY"), date("31mar2021", "DMY")) hashsort householdid by householdid : gegen hh_born_in_range = max(born_in_range) keep if hh_born_in_range == 1
I unfortunately cannot share data as the provider wishes it to remain protected.
Comment