Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Only Keeping Children if Their Parents are in the Survey

    Hi All,

    I am trying to make a child file where I only keep observations of those children who have parents in the survey. First, I will show how I made a "father file," which only keeps ID's which can be found in the "ID_F" column.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5(ID Income ID_F)
    "A1" "10" "D4"
    "B2" "11" "." 
    "C3" "12" "A1"
    "D4" "13" "B2"
    "E5" "14" "A1"
    end

    To only keep IDs found in the ID_F column, I used the lines:

    expand 2, gen(ex)
    replace ID = ID_F if ex
    bys ID (ex): keep if ex[_N]
    drop if ex == 1
    drop ex


    But now I want to only keep IDs who have an input in "ID_F" or "ID_M" (not necessarily both) where the value for "ID_F" or "ID_M" is also in the ID column. Here is a data example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5(ID Revenue ID_F ID_M)
    "A1" "10" "Z4" "Z5"
    "B2" "11" "."  "C3"
    "C3" "12" "A1" "D4"
    "D4" "13" "."  "M3"
    "E5" "14" "A1" "C3"
    end

    In this case, I only want to keep observations B2, C3, and E5, since each have a father or mother that is also in the survey. Any guidance would be much appreciated.

    Best,
    Cora

  • #2
    To the either set of code given in post #2 of your previous topic at

    https://www.statalist.org/forums/for...running-slowly

    add
    Code:
    keep if parents_present

    Comment


    • #3
      Try below code. Please share out the observed speed when applying it to your actual data.
      Code:
      expand 2, gen(ex)
      
      replace ID = ID_F if ex
      bys ID (ex): gen match_F = ex[_N]
      
      replace ID = ID_M if ex
      bys ID (ex): gen match_M = ex[_N]
      
      keep if (match_F | match_M) & !ex
      drop ex

      Comment

      Working...
      X