Calculating average year on year change from a dummy variable?

Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#1

Calculating average year on year change from a dummy variable?

19 Jan 2024, 14:08

Hi all,

I'm stuck with an apparently simple problem of finding average year-on-year change in a variable of interest. The average I got is wildly off and I suspect I have made a mistake in the code. I describe below a sample data and my result.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str3 id float(year school) "111" 2011 221 "111" 2012 222 "111" 2013 221 "112" 2011 222 "112" 2012 221 "112" 2013 224 "113" 2011 224 "113" 2012 222 "113" 2013 222 "114" 2011 222 "114" 2012 221 "114" 2013 224 end

Code:

sort id year by id (year): g move_school=0 if _n>1 by id (year): replace move_school=1 if school !=school[_n-1] & !missing(school) & !missing(school[_n-1]) gcollapse(max) pupilmoved= move_school,by(id) merge

I'm trying to find the percentage of pupils who move schools from one year to the other. I did the following

Code:

sort year by year: egen annualmove=mean(pupilmoved) tab annualmove year tab annualmove year,col nofreq

I suspect this is wrong because I'm getting absurdly high numbers in my original data. I think I might be misunderstanding the concept/coding it incorrectly.

Appreciate any help.

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29806
#2

19 Jan 2024, 14:59

What you are calculating with this code is not the proportion of students who moved in a given year. You are calculating, the proportion of students who have ever changed school in the course of your study.

Also, the use of [_n-1] to refer to the preceding year's value of a variable is fine if you are sure there are no gaps in the data. But most data sets have gaps. It is safer to -xtset- the data and then use the lag operator instead.

Code:

encode id, gen(n_id) xtset n_id year gen byte movedschool = school != L1.school if !missing(school, L1.school) collapse (mean) moved_this_year = movedschool, by(year)
Comment
Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#3

23 Jan 2024, 12:25

Originally posted by Clyde Schechter View Post

What you are calculating with this code is not the proportion of students who moved in a given year. You are calculating, the proportion of students who have ever changed school in the course of your study.

Also, the use of [_n-1] to refer to the preceding year's value of a variable is fine if you are sure there are no gaps in the data. But most data sets have gaps. It is safer to -xtset- the data and then use the lag operator instead.

Code:

encode id, gen(n_id) xtset n_id year gen byte movedschool = school != L1.school if !missing(school, L1.school) collapse (mean) moved_this_year = movedschool, by(year)

Thanks Clyde. One query: I'm not sure why [_n-1] would be wrong if there are gaps in the data? Could you perhaps point to some resource/elaborate on this?

Thank you again for your time.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29806
#4

23 Jan 2024, 12:40

Suppose that there is a gap in the data. In that case, school[_n-1], will contain the school that the student was enrolled in in the last year that the data set mentions the student, which is no longer the immediately previous year but some year still earlier. The lag operator, however, never makes this mistake: it is programmed to return a missing value where it finds a gap. And the lag operator is thereby speaking truth: if there is no observation from the immediately preceding year in the data for this student, then, indeed, the school the student attended is unknown, which is appropriately represented by a missing value.
Comment
Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#5

23 Jan 2024, 12:42

I see! Yes this makes sense! thank you very much.
Comment

Announcement

Calculating average year on year change from a dummy variable?

Comment

Comment

Comment

Comment