Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recording School Attended in Previous Year

    I have a student dataset by grade across years. For each year, I need to create variables showing school attended in previous year. The issue is that the grade-by-year observations are not unique. Here is a dummy dataset:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(student_id school_id grade) int year
    1 23 1 2001
    1 35 2 2002
    1 36 2 2002
    1 23 3 2003
    1 56 4 2004
    1 56 5 2005
    1 35 5 2005
    1 54 5 2005
    1 49 6 2006
    1 32 6 2006
    1 23 7 2007
    1 23 8 2008
    end
    I would like the output to be like this where school_id_1 shows the first school attended in the previous year, school_id_2 shows the second school and so on:
    student_id school_id grade year school_id_1 school_id_2 school_id_3
    1 23 1 2001 . .
    1 35 2 2002 23
    1 36 2 2002 23
    1 23 3 2003 35 36
    1 56 4 2004 23
    1 56 5 2005 56
    1 35 5 2005 56
    1 54 5 2005 56
    1 49 6 2006 56 35 54
    1 32 6 2006 56 35 54
    1 23 7 2007 49 32
    1 23 8 2008 23
    This is my code so far:
    HTML Code:
    sort student_id year
    
    gen school_id_1 = .
    gen school_id_2 = .
    gen school_id_3 = .
    
    
    forval i = 2/`=_N' {
        
        if student_id[`i'] == student_id[`i'-1] & year[`i'] > year[`i'-1] {
            replace school_id_1 = school_id[`i'-1] in `i'
        }
    
        if student_id[`i'] == student_id[`i'-1] & year[`i'] == year[`i'-1] {
            replace school_id_1 = school_id_1[`i'-1] in `i'
            replace school_id_2 = school_id[`i'-1] in `i'
        }
    
        if student_id[`i'] == student_id[`i'-1] & year[`i'] == year[`i'-1] & year[`i'] == year[`i'-2] {
            replace school_id_1 = school_id_1[`i'-2] in `i'
            replace school_id_2 = school_id_2[`i'-1] in `i'
            replace school_id_3 = school_id[`i'-2] in `i'
        }
    }
    But the output this gives me is this:
    student_id school_id grade year school_id_1 school_id_2 school_id_3
    1 23 1 2001
    1 35 2 2002 23
    1 36 2 2002 23 35
    1 23 3 2003 36
    1 56 4 2004 23
    1 56 5 2005 56
    1 35 5 2005 56 56
    1 54 5 2005 56 56 56
    1 49 6 2006 54
    1 32 6 2006 54 49
    1 23 7 2007 32
    1 23 8 2008 23
    I would like school_id_2 and school_id_3 to be filled up only when there are multiple observations in the previous year. How can I fix this?

  • #2
    Code:
    frame put _all, into(previous_year_schools)
    frame previous_year_schools {
        sort student_id year, stable
        by student_id year: gen _j = _n
        reshape wide school_id, i(student_id year)  j(_j)
        replace year = year + 1
        rename school_id* school_id_*
    }
    
    frlink m:1 student_id year, frame(previous_year_schools)
    frget school_id_*, from(previous_year_schools)
    drop previous_year_schools
    frame drop previous_year_schools

    Comment


    • #3
      This worked perfectly. Thank you!

      Comment

      Working...
      X