Data cleaning and reshaping

filipepaula

Join Date: Oct 2014

Posts: 22
#1

Data cleaning and reshaping

Yesterday, 18:25

Dear statalists,

I have number of excel files which have a structure such as this:

student_id GH JG GH JO JG

1 9 14 13 17 18

2 5 4 13 14 18

3 12 15 17 14 15

Column headers contain the initials of the professor which gave each particular lesson (column=lesson). They usually repeat because several lessons are given by each professor.
Each row is a student (unique identifier).
Each cell is a grade given to each student by the professor during each lesson.

So my aim is to end up with a dataset in a long format, such as this:

student_id prof grade

1 GH 9

1 JG 14

1 GH 13

1 JO 17

1 JG 18

2 GH 5

(... and so on)

I just can't figure out how to do this. The limitation is that if I import the excel file in the first place, it doesn't work because professors repeat themselves, so I end up with several unnamed variables.
Can someone please shed some light here? Do you think I should learn a different language for this kind of task?

Thank you in advance,
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 34748

Yesterday, 18:45

I read your sample data as string from a copy ported to MS Excel and got this dataset: I don't show the import excel code as it contains directory details for my computer.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str10 A str2(B C D E F)
"student_id" "GH" "JG" "GH" "JO" "JG"
"1"          "9"  "14" "13" "17" "18"
"2"          "5"  "4"  "13" "14" "18"
"3"          "12" "15" "17" "14" "15"
end

Code:

rename A id 

rename (B-F) (grade#), addnumber

forval j = 1/5 {
local name`j' = grade`j'[1]
 }

drop in 1

destring *, replace

reshape long grade, i(id) j(prof)

forval j = 1/5 {
     label def prof `j' "`name`j''", add
}

label val prof prof

list

Your code may need to change F and 5.

As shown, Stata is fine for this. The only change I suggest is to give MS Excel its correct name.

Comment

filipepaula

Join Date: Oct 2014

Posts: 22
#3

Yesterday, 19:04

Thank you so much Nick Cox! I was just stuck there.
Best
Comment

student_id	GH	JG	GH	JO	JG
1	9	14	13	17	18
2	5	4	13	14	18
3	12	15	17	14	15

student_id	prof	grade
1	GH	9
1	JG	14
1	GH	13
1	JO	17
1	JG	18
2	GH	5

Announcement

Data cleaning and reshaping

Comment

Comment