Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping all members of a group if at least one member of the group has a certain characteristic

    I have a yearly panel dataset of families, covering years 2019-2022. Each person has an individual id, and individuals living in a household together are linked by another variable household id. I would like to keep all households that have witnessed the birth of a child between october 1, 2020 and march 31, 2021.

    Here is the code I am using:

    Code:
    gen born_in_range = inrange(birthday, date("01oct2020", "DMY"), date("31mar2021", "DMY"))
    hashsort householdid
    by householdid : gegen hh_born_in_range = max(born_in_range)
    keep if hh_born_in_range == 1
    When I browse the data, it seems like I am getting rid of parents as well, as a lot of household ids only appear for the child born within that timeframe. I have 104'000 observations with born_in_range ==1 and 117'000 observations with hh_born_in_range == 1, I should have multiple times as many obs with hh_born_in_range comapred to born_in_range. In addition, I do not have any observations from the year 2019 left.

    I unfortunately cannot share data as the provider wishes it to remain protected.

  • #2
    You're using there various unexplained community-contributed commands, but your code seems equivalent to


    Code:
    bysort householdid : egen hh_born_in_range = max(inrange(birthday, mdy(10, 1, 2020), mdy(3, 31,2021))))
    keep if hh_born_in_range
    I can't easily diagnose a problem here. Missing dates would be mapped to 0 and could not be a source of false negatives unless all birthdays that should qualify were missing. So all I can suggest is that you look more closely at data and results. For example, MDY dates or MY or YM or Y dates would all fail the conversion.


    A simple check is

    Code:
    count if missing(daily(birthday, "DMY"))

    Comment


    • #3
      Thank you for the reply.

      There are no missing values for the variable birthday. It is coded as "DMY", i.e. "1jan2020" for example.

      My code is equivalent to what you wrote and gave me the exact result I had with my own.

      The error comes from some manipulations I had to do due to the way the data was delivered to me.

      Thanks for the help !

      Comment

      Working...
      X