Reshaping to wide with a string j variable

George Kariuki

Join Date: Jul 2015
Posts: 93

Reshaping to wide with a string j variable

24 Feb 2020, 02:25

I am looking to reshape a dataset with courses done by each of the observation. Each individual has done different sets of the course modules even though the goal was to have everyone do all courses. That is to mean, after reshaping, all the possibly completed courses under *course* will be independent variables with missings for what one has not done. I'd wish to use each course as a control in a regression. Here is the MWE data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double pid byte completion str47 coursetitle byte engagement int minutesspent
9703090408089 100 "01. Why Work and Why YOUth Matter"   100 57
9703090408089  97 "02. Growth Mindset"                   97 50
9703090408089  49 "03.  Know Yourself to Grow Yourself"  49 30
9703090408089  86 "08. Money Management I"               86 38
9110275678088  69 "01. Why Work and Why YOUth Matter"    69 40
9110275678088  81 "02. Growth Mindset"                   81 44
9110275678088   7 "03.  Know Yourself to Grow Yourself"   7 58
9110275678088  83 "04. Expectations"                     83 34
9110275678088  97 "05. Professionalism"                  97 33
9110275678088  96 "06. Onboarding - Getting It Right"    96 24
9110275678088  84 "07. Succeeding in the Workplace"      84 38
9110275678088 100 "08. Money Management I"              100 44
9110275678088  96 "09. Money Management II"              96 46
9110275678088   0 "A. CV Prep and Cover Letter"           0 56
9110275678088 100 "S03. Know Your Industry"             100 23
9401080275085 100 "01. Why Work and Why YOUth Matter"   100 97
9401080275085 100 "02. Growth Mindset"                  100 51
9401080275085  47 "03.  Know Yourself to Grow Yourself"  47 92
9401080275085  83 "04. Expectations"                     83 34
9401080275085 100 "05. Professionalism"                 100 34
end

I attempted to run the code

Code:

reshape wide  completion engagement minutesspent, i(pid) j("coursetitle") string

But collapsing the *coursetitle* variable generates many variables in the wide format from one course title. That is "02. Growth Midset" becomes 3 different variables in the wide format. Does anyone has any innitial steps I should do?

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10062

24 Feb 2020, 07:27

I think you need to reshape long first before you reshape wide. The names of your variables to be are too long and they have characters that are not allowed in Stata variable names.I address this partially but you may need to consider renaming later on.

Code:

rename (completion engagement minutesspent) (grade#), addnumber(1)
reshape long grade, i(pid coursetitle) j(which)
lab def which 1 "completion" 2 "engagement" 3 "minutesspent"
lab values which which
encode coursetitle, gen(ct)
replace coursetitle= subinstr(trim(subinstr(substr(substr(coursetitle, 4, .), 1, 10), ".", "", .)), " ", "_", .) + string(ct)
drop ct
reshape wide grade, i(pid which) j(coursetitle, string)
rename (grade*) (*)

Res.:

Code:

 l, sepby(pid)

     +---------------------------------------------------------------------------------------------------------------------------------------------------+
     |       pid          which   CV_Pr~10   Expect~4   Growth~2   Know_~11   Know_Y~3   Money_~8   Money_~9   Onboar~6   Profes~5   Succee~7   Why_Wo~1 |
     |---------------------------------------------------------------------------------------------------------------------------------------------------|
  1. | 9.110e+12     completion          0         83         81        100          7        100         96         96         97         84         69 |
  2. | 9.110e+12     engagement          0         83         81        100          7        100         96         96         97         84         69 |
  3. | 9.110e+12   minutesspent         56         34         44         23         58         44         46         24         33         38         40 |
     |---------------------------------------------------------------------------------------------------------------------------------------------------|
  4. | 9.401e+12     completion          .         83        100          .         47          .          .          .        100          .        100 |
  5. | 9.401e+12     engagement          .         83        100          .         47          .          .          .        100          .        100 |
  6. | 9.401e+12   minutesspent          .         34         51          .         92          .          .          .         34          .         97 |
     |---------------------------------------------------------------------------------------------------------------------------------------------------|
  7. | 9.703e+12     completion          .          .         97          .         49         86          .          .          .          .        100 |
  8. | 9.703e+12     engagement          .          .         97          .         49         86          .          .          .          .        100 |
  9. | 9.703e+12   minutesspent          .          .         50          .         30         38          .          .          .          .         57 |
     +---------------------------------------------------------------------------------------------------------------------------------------------------+

Comment

George Kariuki

Join Date: Jul 2015

Posts: 93
#3

24 Feb 2020, 09:03

Thanks Andrew. Honestly I was so stuck on getting the coursetitle broken into dummy variables for each of the individual course not realizing the other variable structures showing how long each subject took in a given course and whether they completed the course anyways. This looks neat for now! Thanks. I'd want to generate a bunch of indicator variables for if one has done a course (now grade*) or not whether the completion rates differ. I can do the var generation but since the course vars now have a systematic stub name (i.e. all start with grade*), is there a simple way (loop) that I can generate the new set of varibles in a go? Thanks a lot.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10062
#4

24 Feb 2020, 09:29

Do you simply want to calculate the completion rate? Here is a direct way from the output of #2 without creating dummies

Code:

egen wanted= rownonmiss(CV_Prep_an10- Why_Work1) replace wanted= (wanted/11)*100

If you have to create completion dummies (11 in total)

Code:

foreach var of varlist CV_Prep_an10-Why_Work1{ gen comp_`var'= !missing(`var') }
Comment
George Kariuki

Join Date: Jul 2015

Posts: 93
#5

24 Feb 2020, 12:38

Thanks Andrew. This is great. I was not looking to get the completion rates but rather dummy variables to correrate with other psychometrics (not in this dataset) or even use independed course dummies in a regression. Thank you all together, this works very well!
Comment
George Kariuki

Join Date: Jul 2015

Posts: 93
#6

24 Feb 2020, 14:19

Asante sana!

Last edited by George Kariuki; 24 Feb 2020, 14:30.
Comment

Announcement

Reshaping to wide with a string j variable

Comment

Comment

Comment

Comment

Comment