Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping to wide with a string j variable

    I am looking to reshape a dataset with courses done by each of the observation. Each individual has done different sets of the course modules even though the goal was to have everyone do all courses. That is to mean, after reshaping, all the possibly completed courses under *course* will be independent variables with missings for what one has not done. I'd wish to use each course as a control in a regression. Here is the MWE data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double pid byte completion str47 coursetitle byte engagement int minutesspent
    9703090408089 100 "01. Why Work and Why YOUth Matter"   100 57
    9703090408089  97 "02. Growth Mindset"                   97 50
    9703090408089  49 "03.  Know Yourself to Grow Yourself"  49 30
    9703090408089  86 "08. Money Management I"               86 38
    9110275678088  69 "01. Why Work and Why YOUth Matter"    69 40
    9110275678088  81 "02. Growth Mindset"                   81 44
    9110275678088   7 "03.  Know Yourself to Grow Yourself"   7 58
    9110275678088  83 "04. Expectations"                     83 34
    9110275678088  97 "05. Professionalism"                  97 33
    9110275678088  96 "06. Onboarding - Getting It Right"    96 24
    9110275678088  84 "07. Succeeding in the Workplace"      84 38
    9110275678088 100 "08. Money Management I"              100 44
    9110275678088  96 "09. Money Management II"              96 46
    9110275678088   0 "A. CV Prep and Cover Letter"           0 56
    9110275678088 100 "S03. Know Your Industry"             100 23
    9401080275085 100 "01. Why Work and Why YOUth Matter"   100 97
    9401080275085 100 "02. Growth Mindset"                  100 51
    9401080275085  47 "03.  Know Yourself to Grow Yourself"  47 92
    9401080275085  83 "04. Expectations"                     83 34
    9401080275085 100 "05. Professionalism"                 100 34
    end

    I attempted to run the code

    Code:
    reshape wide  completion engagement minutesspent, i(pid) j("coursetitle") string
    But collapsing the *coursetitle* variable generates many variables in the wide format from one course title. That is "02. Growth Midset" becomes 3 different variables in the wide format. Does anyone has any innitial steps I should do?

  • #2
    I think you need to reshape long first before you reshape wide. The names of your variables to be are too long and they have characters that are not allowed in Stata variable names.I address this partially but you may need to consider renaming later on.

    Code:
    rename (completion engagement minutesspent) (grade#), addnumber(1)
    reshape long grade, i(pid coursetitle) j(which)
    lab def which 1 "completion" 2 "engagement" 3 "minutesspent"
    lab values which which
    encode coursetitle, gen(ct)
    replace coursetitle= subinstr(trim(subinstr(substr(substr(coursetitle, 4, .), 1, 10), ".", "", .)), " ", "_", .) + string(ct)
    drop ct
    reshape wide grade, i(pid which) j(coursetitle, string)
    rename (grade*) (*)
    Res.:

    Code:
     l, sepby(pid)
    
         +---------------------------------------------------------------------------------------------------------------------------------------------------+
         |       pid          which   CV_Pr~10   Expect~4   Growth~2   Know_~11   Know_Y~3   Money_~8   Money_~9   Onboar~6   Profes~5   Succee~7   Why_Wo~1 |
         |---------------------------------------------------------------------------------------------------------------------------------------------------|
      1. | 9.110e+12     completion          0         83         81        100          7        100         96         96         97         84         69 |
      2. | 9.110e+12     engagement          0         83         81        100          7        100         96         96         97         84         69 |
      3. | 9.110e+12   minutesspent         56         34         44         23         58         44         46         24         33         38         40 |
         |---------------------------------------------------------------------------------------------------------------------------------------------------|
      4. | 9.401e+12     completion          .         83        100          .         47          .          .          .        100          .        100 |
      5. | 9.401e+12     engagement          .         83        100          .         47          .          .          .        100          .        100 |
      6. | 9.401e+12   minutesspent          .         34         51          .         92          .          .          .         34          .         97 |
         |---------------------------------------------------------------------------------------------------------------------------------------------------|
      7. | 9.703e+12     completion          .          .         97          .         49         86          .          .          .          .        100 |
      8. | 9.703e+12     engagement          .          .         97          .         49         86          .          .          .          .        100 |
      9. | 9.703e+12   minutesspent          .          .         50          .         30         38          .          .          .          .         57 |
         +---------------------------------------------------------------------------------------------------------------------------------------------------+

    Comment


    • #3
      Thanks Andrew. Honestly I was so stuck on getting the coursetitle broken into dummy variables for each of the individual course not realizing the other variable structures showing how long each subject took in a given course and whether they completed the course anyways. This looks neat for now! Thanks. I'd want to generate a bunch of indicator variables for if one has done a course (now grade*) or not whether the completion rates differ. I can do the var generation but since the course vars now have a systematic stub name (i.e. all start with grade*), is there a simple way (loop) that I can generate the new set of varibles in a go? Thanks a lot.

      Comment


      • #4
        Do you simply want to calculate the completion rate? Here is a direct way from the output of #2 without creating dummies

        Code:
        egen wanted= rownonmiss(CV_Prep_an10- Why_Work1)
        replace wanted= (wanted/11)*100
        If you have to create completion dummies (11 in total)

        Code:
        foreach var of varlist CV_Prep_an10-Why_Work1{
              gen comp_`var'= !missing(`var') 
        }

        Comment


        • #5
          Thanks Andrew. This is great. I was not looking to get the completion rates but rather dummy variables to correrate with other psychometrics (not in this dataset) or even use independed course dummies in a regression. Thank you all together, this works very well!

          Comment


          • #6
            Asante sana!
            Last edited by George Kariuki; 24 Feb 2020, 14:30.

            Comment

            Working...
            X