PISA data, REPEST, and appending datasets

Daniel Foster

Join Date: Feb 2024

Posts: 4
#1

PISA data, REPEST, and appending datasets

20 Feb 2024, 12:40

Hello,

I am using REPEST to carry out a fixed effects regression using PISA data from 2000-2018.

I have appended 2000, 2003, 2006...2018 PISA waves, and added a variable for the year of each wave (titled "pisa_cylce). However, when I try to compute basic statistics like the mean, I get an error that reads "2015.....post __00000Z not found" and "2018.....post __00000Z not found", signalling that there is an issue with the 2015 and 2018 PISA data.

repest PISA, estimate(mean pv@read) by(pisa_cycle)

I recognize that there are 10 as opposed to 5 plausible values in the 2015 and 2018 waves. I also noticed that the names for the replicant weights differ between the 2000-2012 waves and 2015-2018 waves, as well as the primary sampling unit (i.e., schlid vs. cntschoolid).

Is this an issue with how I've structured the data?

Thank you!

Last edited by Daniel Foster; 20 Feb 2024, 12:43.
Tags: append, panel data, PISA, repest
Philip Matthews

Join Date: Apr 2014

Posts: 23
#2

22 Feb 2024, 09:26

Reading the repest help file it seems that repest allows "by" but assumes it refers to a variable within the PISA dataset. it does not use an external variable such as the year labelling the entire dataset. Given that you have a variable "pisa_cycle" (I presume "pisa_cylce" was a typo) then you might use a method I have outlined in the code below to gather results from using repest. Also, I see from the help file that repest expects to find either "PISA" or "PISA2015" in the call. (That is also shown in the repest.ado file around lines 970 to 1000.) As far as I know, if repest finds it is using a PISA file later than 2015 then it simply issues a warning about applying the 2015 method. It assumes that the labelling system for weights and number of pvs changed in 2015 and remains the same thereafter. Of course I don't know why you have stacked the PISA files (it does seem a strange thing to do), but I have added a little code below that might do what your post was trying to do. However: note that for fairly obvious reasons I haven't tested it on real stacked PISA files, so it is up to you to treat it with caution - caveat emptor!!

Code:

// Assume there is a single variable in the data set labelled pisa_cycle that has all the records for // each year listed as integers 2000, 2003, 2006, ... 2022. // Code below uses a simple example - finding the mean for science for the USA // by year and storing the results in two matrices: one just for the mean value // the other as in the output table provided by repest/Stata. set more off mat drop _all foreach year in 2000 2006 2015 2022 { //for example preserve keep if pisa_cycle == `year' if `year' <= 2006 { local YEAR = "PISA" } else { local YEAR = "PISA2015" } keep if cnt == "USA" repest `YEAR', estimate(mean pv@scie) mat EB = e(b) mat EBCollect = (nullmat(EBCollect)\EB) mat list EBCollect mat RT = r(table) mat RTlong = RT' //RT is a column vector so transpose to make more readable. mat RTCollect = (nullmat(RTCollect)\RTlong) mat list RTCollect display "End of year " "`year'" display "_________________________________________" restore }
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#3

22 Feb 2024, 11:51

Sorry, copied over wrong bit of code. The if clause should have been

Code:

if year < 2015
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#4

22 Feb 2024, 11:52

Code:

‘year’
Comment

Announcement

PISA data, REPEST, and appending datasets

Comment

Comment

Comment