Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating average year on year change from a dummy variable?

    Hi all,

    I'm stuck with an apparently simple problem of finding average year-on-year change in a variable of interest. The average I got is wildly off and I suspect I have made a mistake in the code. I describe below a sample data and my result.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 id float(year school)
    "111" 2011 221
    "111" 2012 222
    "111" 2013 221
    "112" 2011 222
    "112" 2012 221
    "112" 2013 224
    "113" 2011 224
    "113" 2012 222
    "113" 2013 222
    "114" 2011 222
    "114" 2012 221
    "114" 2013 224
    end
    Code:
    sort id year
    by id (year): g move_school=0 if _n>1
    by id (year): replace move_school=1 if school !=school[_n-1] & !missing(school) & !missing(school[_n-1])
    gcollapse(max) pupilmoved= move_school,by(id) merge
    I'm trying to find the percentage of pupils who move schools from one year to the other. I did the following

    Code:
    sort year
    by year: egen annualmove=mean(pupilmoved)
    tab annualmove year
    tab annualmove year,col nofreq
    I suspect this is wrong because I'm getting absurdly high numbers in my original data. I think I might be misunderstanding the concept/coding it incorrectly.

    Appreciate any help.

    Thanks!

  • #2
    What you are calculating with this code is not the proportion of students who moved in a given year. You are calculating, the proportion of students who have ever changed school in the course of your study.

    Also, the use of [_n-1] to refer to the preceding year's value of a variable is fine if you are sure there are no gaps in the data. But most data sets have gaps. It is safer to -xtset- the data and then use the lag operator instead.

    Code:
    encode id, gen(n_id)
    xtset n_id year
    
    gen byte movedschool = school != L1.school if !missing(school, L1.school)
    collapse (mean) moved_this_year = movedschool, by(year)

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      What you are calculating with this code is not the proportion of students who moved in a given year. You are calculating, the proportion of students who have ever changed school in the course of your study.

      Also, the use of [_n-1] to refer to the preceding year's value of a variable is fine if you are sure there are no gaps in the data. But most data sets have gaps. It is safer to -xtset- the data and then use the lag operator instead.

      Code:
      encode id, gen(n_id)
      xtset n_id year
      
      gen byte movedschool = school != L1.school if !missing(school, L1.school)
      collapse (mean) moved_this_year = movedschool, by(year)
      Thanks Clyde. One query: I'm not sure why [_n-1] would be wrong if there are gaps in the data? Could you perhaps point to some resource/elaborate on this?

      Thank you again for your time.

      Comment


      • #4
        Suppose that there is a gap in the data. In that case, school[_n-1], will contain the school that the student was enrolled in in the last year that the data set mentions the student, which is no longer the immediately previous year but some year still earlier. The lag operator, however, never makes this mistake: it is programmed to return a missing value where it finds a gap. And the lag operator is thereby speaking truth: if there is no observation from the immediately preceding year in the data for this student, then, indeed, the school the student attended is unknown, which is appropriately represented by a missing value.

        Comment


        • #5
          I see! Yes this makes sense! thank you very much.

          Comment

          Working...
          X