PISA Data, using SPSS/SAS control files

Richard Williams

Join Date: Apr 2014

Posts: 4955
#16

30 May 2014, 08:40

The updated files are at

https://drive.google.com/folderview?...k0&usp=sharing

I also included the original SPSS .sav files in case anybody would prefer to use -usespss- or R or some other method to read them in, or else wants to convert into some other type of file format.

I don't know if anybody besides Konrad wants them so I am not sure how long they will stay, but I will leave them up for at least a little while. If somebody reads this two years from now and the files aren't there anymore, you could send me a message.

I notice that the COG12 files in Stata are much much smaller than their SPSS counterparts. Hopefully this is because Stata is a vastly superior program. As Sergiy noted earlier, it appears that the SPSS programs store many variables as strings rather than as numbers. I don't know why, but it does increase the size of the files, and I imagine you would have to convert to numbers to use them.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#17

30 May 2014, 08:47

It would be nice if Pisa just provided Stata code. I notice that the first three waves of the European Social Survey only provided SAS and SPSS versions, but Stata was added in later waves:

https://drive.google.com/folderview?...k0&usp=sharing

Maybe some lobbying would help.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#18

02 Jun 2014, 03:56

Richard,
Thank you for the files, much appreciated. I agree with your point concerning the Stata code. Teaching Stata is common across economics departments and OECD data sources are widely used by economists and other researchers interested in the field so making files available in the commonly used format would be definitely handy. This said, my understanding is that PISA is rather seen as "social" data set, hence preference for the SPSS that is more widely used by sociologists and psychologists.

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#19

02 Jun 2014, 06:54

I am not so sure about SPSS being more widely used by Social Scientists. I get the impression SPSS is more interested in big business now than it is academia. But in any event Stata certainly has a large following.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#20

01 Oct 2014, 18:46

Reviving an old thread -- I had deleted the Pisa 2012 Stata files but somebody asked me if I could make them available again. So, they can be found at

https://drive.google.com/folderview?...0E&usp=sharing

I will try to leave them there, but if you read this two years from now and they are not there then just write to me.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#21

03 Oct 2014, 06:00

Richard,

Thank you for your help on that. It was very useful for me. It crossed my mind that it could be handy to save the files on some free file sharing service, like FileDropper or rapidshare.

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#22

03 Oct 2014, 07:22

Thanks Konrad. I didn't even know these things existed. Fildropper says (unless you pay) files go away if not downloaded every 30 days. So for now I guess I will keep them where they are. Maybe i will zip them. If anybody else has a paid account and wants to put the files there, feel free. Just let people know how to access them.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Amy Dillon

Join Date: Apr 2025

Posts: 4
#23

05 Apr 2015, 15:33

Thank you for reposting those files - as a student of statistics and a mother (!) you have saved me a lot of time
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#24

06 Apr 2015, 07:31

Since I posted back in October, Notre Dame has worked out this deal where we supposedly get unlimited storage on Google Drive. I have this mad urge to find a 100 terabyte file and see if they really mean it. In any event, the Pisa files I created probably aren't going to disappear anytime soon if anybody else wants them. But email me if they do.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Adam Bergal

Join Date: Jan 2016

Posts: 16
#25

30 Jan 2016, 05:05

Prof. Richard,
tank you for uploading these files. I really appreciate it. I just have two questions if you don't mind:

1. Which of the values are the actual score each person had in Math/reading/Science? There is a lot of predicted values (PV1MATH etc.), should i use an average of them or is there one specific value they use when calculation the official values? I did (=average) in excel for Sweden and Albania but was not able to replicate the official scores for any of them.

2. Do you have the PISA-files for the other years as well?

Sincerely
Mikael
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#26

30 Jan 2016, 06:15

Mikael, all I did was run the conversions. You'll have to check the documentation for information on the variables. Sorry, I only did it for 2012. The readme files on the page provide what other information I have. Good luck with it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#27

30 Jan 2016, 12:32

Mikael, in order to use PISA datesets (at the very least) you will need to know how to employ the weights and brr values included with plausible values in some of the files. (With that knowledge you will understand why your investigations with Excel failed.)

You may find it useful to investigate the user-written packages pisareg and pisatools. (Type, e.g., findit pisareg within Stata.) The help files will get you started.

To understand what is involved in using large-scale survey data you may find it helpful to refer to this book:

Steven G. Heeringa, Brady T. West, and Patricia A. Berglund; Applied Survey Data Analysis (Boca Raton, FL: Chapman & Hall, 2010).

It is particularly good on explaining the basics of the ways data such as PISA can/should be analysed. Of course, as Richard says it would also be wise to consult the PISA code books and technical reports that can be found on the OECD web site.

Best wishes, Philip
Comment
Adam Bergal

Join Date: Jan 2016

Posts: 16
#28

01 Feb 2016, 05:11

Thank you Philip!

I downloaded the STATA-programs that you recommended. I was able to replicate the basic findings for Math/Reading/Science, after i converted the variable names to lowercase, with the command "pisastats gender, stats(mean sd) cnt(SWE) pv(math) save(test) sas" and by changing "math" to "read" and "scie" respectively. "gender" was just a variable I created with male=0 and female=1.

The problem is now that I'm not able to make the "over(var)" command to work. When I put in "over(gender)" in the command above I just get at message saying "SWE - math: 1 mean .....sd .....2 cnt not found" r(111). Any idea why that is?

The only thing that I would like to do atm is to calculate the gender differences in Math/Reading/Science for native borne in Sweden together with their SD. I was hoping the command above would do the trick... Is there any way I can get SATA to calculate the values with only native born participants without having to remove them from the dataset manually? I wasn't able to use the "if" command with pisastats.

Sincerely
Mikael
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#29

01 Feb 2016, 05:22

Mikael: Please change your registration to use a family name, not "STATA". http://www.statalist.org/forums/help#realnames explains our policy and how to fix this.

It would be a good idea for you to read the entire document.
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#30

01 Feb 2016, 12:35

Mikael, I have not used pisareg etc. for some years, and from your message I cannot be sure I know the specific problem you are having (do follow Nick's advice). However, during a quick try using Stata 14.1 on a Mac I find pisastats is not behaving properly at times. There seems to be a problem with it using gender as a 0/1 binary variable. I also find it does not always recognise the cnt variable properly. I do not have the time/energy to discover the reason(s); but if you are sure you are using the correct syntax you could contact the authors about the problems.

However, you should be able to find alternative ways of generating the results you seek. Below is a copy of a do file that I have annotated to explain the steps in one approach. (I apologise if the comments are too trivial for you.) The key points are: (i) making use of pv rather than pisatats (see top right of the pisareg help file); (ii) generating a new variable that codes males as 2 rather than 0; (iii) using preserve before temporarily dropping countries other than Sweden, then using restore to revert to the orginal file (with all countries as at the start). WARNING: on the principle that 'what can go wrong, will go wrong" do make sure you have a backup copy of your PISA data file (especially) before relying on preserve/restore.

The example calculates the science performance means for the entire SWE sample, and then for males and females separately.

Code:

set more off // Create a copy of gender so that the orginal is left unchanged. gen newgender=gender // Change 0 to 2. replace newgender = 2 if newgender==0 preserve // Keeps original file for later re-use. drop if cnt!="SWE" // Temporarily drop countries other than Sweden. // Overall results. pv [aw=w_fstuwt], pv(pv*scie) cmd("mean") brr rw(w_fstr*) fays(0.5) // Results for males (recoded 0 to 2). pv if newgender==2 [aw=w_fstuwt], pv(pv*scie) cmd("mean") brr rw(w_fstr*) fays(0.5) // Results for females (newgender = 1 as before). pv if newgender==1 [aw=w_fstuwt], pv(pv*scie) cmd("mean") brr rw(w_fstr*) fays(0.5) restore // Return to original file.

I hope this is sufficient to get you further on.
Best wishes, Philip
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment