Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify individuals who left between two survey waves

    Dear Statalist,

    Please consider this simplified dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 id_panel str20 status_2020 float ybirth_2020 str3 hhid_2020 int ybirth_CA_2020 str20 status_2006 float ybirth_2006 str3 hhid_2006 float ybirth_CA_2006
    "1" ""                        . ""       . "child"                1999 "BBB" .
    "2" "respondent"           1964 "AAA" 1999 "respondent"           1964 "BBB" .
    "3" "respondent's partner" 1970 "AAA" 1999 "respondent's partner" 1970 "BBB" .
    end

    This dataset represents a family of 3 individuals observed twice, in 2006 for the variables ending with _2006 and in 2020 for the variables ending with _2020. There is a unique, longitudinal individual identifier called "id_panel" allowing to follow an individual over time. However, as household composition can vary over time, it's impossible to have a unique, longitudinal, household id.

    In this datasets, the first observation, a child, leaves the family between 2006 and 2020. It is missing in 2020, but one can know that the child left thanks to the variable ybirth_CA_2020. CA means "Child Away".

    My purpose is to create a binary variable called "departure_from_home_2020" equal to 1 if, in the previous wave of the survey, here 2006, there was a child within the household of individual, but not anymore in 2020.

    The hard task here is the fluctuating household ID over time, represented by AAA and BBB for the same individuals. One condition of the variable is that the ybirth_2006 and ybirth_CA_2020 are the same. Can anyone help me achieve this purpose? I would appreciate any help!
    Last edited by Adam Sadi; 27 May 2024, 10:17.

  • #2
    Are you working with wide data, or long?

    How many waves are there?

    A start:

    Code:
    egen child_2006 = total(status_2006 == "child") , by(hhid_2006)
    egen child_2020 = total(status_2020 == "child") , by(hhid_2006)

    Comment


    • #3
      Professor Ford: Thank you for your answer. I work with wide data, which means one observation corresponds to exactly one individual. I don't mind reshaping my dataset if it's more intuitive. There are 3 waves in total. There is also several ybirth_CA variables for each child living away from the family, but this is not an issue (for the moment).

      Comment


      • #4
        As shown in #2, the hhid_2006 is like a consistent hh identifier across time. You may be able to use that style approach. That code will count children, so you know if there's more or less in later waves.

        I may be helpful to create a dataex that has the full array of examples that cause you problems.

        Comment

        Working...
        X