Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data cleaning and reshaping

    Dear statalists,

    I have number of excel files which have a structure such as this:
    student_id GH JG GH JO JG
    1 9 14 13 17 18
    2 5 4 13 14 18
    3 12 15 17 14 15
    Column headers contain the initials of the professor which gave each particular lesson (column=lesson). They usually repeat because several lessons are given by each professor.
    Each row is a student (unique identifier).
    Each cell is a grade given to each student by the professor during each lesson.

    So my aim is to end up with a dataset in a long format, such as this:
    student_id prof grade
    1 GH 9
    1 JG 14
    1 GH 13
    1 JO 17
    1 JG 18
    2 GH 5
    (... and so on)

    I just can't figure out how to do this. The limitation is that if I import the excel file in the first place, it doesn't work because professors repeat themselves, so I end up with several unnamed variables.
    Can someone please shed some light here? Do you think I should learn a different language for this kind of task?

    Thank you in advance,

  • #2
    I read your sample data as string from a copy ported to MS Excel and got this dataset: I don't show the import excel code as it contains directory details for my computer.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 A str2(B C D E F)
    "student_id" "GH" "JG" "GH" "JO" "JG"
    "1"          "9"  "14" "13" "17" "18"
    "2"          "5"  "4"  "13" "14" "18"
    "3"          "12" "15" "17" "14" "15"
    end
    Code:
    rename A id 
    
    rename (B-F) (grade#), addnumber
    
    forval j = 1/5 {
    local name`j' = grade`j'[1]
     }
    
    drop in 1
    
    destring *, replace
    
    reshape long grade, i(id) j(prof)
    
    forval j = 1/5 {
         label def prof `j' "`name`j''", add
    }
    
    label val prof prof
    
    list
    Your code may need to change F and 5.

    As shown, Stata is fine for this. The only change I suggest is to give MS Excel its correct name.

    Comment


    • #3
      Thank you so much Nick Cox! I was just stuck there.
      Best

      Comment

      Working...
      X