Hi Statalist users,
I am trying to create one bar chart that illustrates placement in a calculus class with the results disaggregated by racial groups and separately by the entire sample. Below is my data. Let me explain my sample dataset. Race is a categorical variable with labels for the racial groups in my sample. The variable recent_hs_cohort represents which cohort a student is in while num_stud represents the number of students for that particular race. The variable placed_math is the number of students enrolled in college calculus.
The code below produces a bar chart that I want. Let me briefly break down my code. In Part 1, I estimate the placement in the math class by the entire sample and by race, respectively. In Part 2, I manually try to create rows for a group called Total. In Part 3, I reshape my data. In Part 4, I plot my data and produce my desired graph. I was wondering if there was a more streamlined way to code Part 2. I don't like how I have to manually manipulate rows as if I using a spreadsheet. I have seen other ways of doing this using this spreadsheet style approach and I see the flaws. There is a link below with a similar thread.
https://www.statalist.org/forums/for...sting-variable
For my circumstances, I need to create a group called Total so that I can graph the results for the full sample. Does anyone have a better way than what I have done? I ask to improve my coding skills. Thanks
I am trying to create one bar chart that illustrates placement in a calculus class with the results disaggregated by racial groups and separately by the entire sample. Below is my data. Let me explain my sample dataset. Race is a categorical variable with labels for the racial groups in my sample. The variable recent_hs_cohort represents which cohort a student is in while num_stud represents the number of students for that particular race. The variable placed_math is the number of students enrolled in college calculus.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte race float recent_hs_cohort double(num_stud placed_math) 1 1 9412 7917 2 1 4430 2991 3 1 3404 2694 4 1 53179 34428 5 1 223 148 6 1 430 314 7 1 4523 3489 8 1 20182 15921 9 1 1080 806 1 2 9044 8706 2 2 4007 3657 3 2 3314 3145 4 2 53057 48197 5 2 291 267 6 2 495 468 7 2 4028 3785 8 2 17858 16880 9 2 4469 4165 1 3 8052 7784 2 3 3030 2746 3 3 2876 2758 4 3 42379 38962 5 3 170 151 6 3 354 335 7 3 4270 4056 8 3 17880 17001 9 3 2151 1996 1 4 8061 7807 2 4 2628 2423 3 4 2339 2228 4 4 39137 36248 5 4 166 154 6 4 314 295 7 4 3790 3626 8 4 15932 15206 9 4 1199 1132 end label values race race label def race 1 "Asian", modify label def race 2 "Black", modify label def race 3 "Filipino", modify label def race 4 "Latino", modify label def race 5 "Indigenous", modify label def race 6 "Pacific Islander", modify label def race 7 "Mixed", modify label def race 8 "White", modify label def race 9 "Missing", modify label values recent_hs_cohort recent_hs_cohort label def recent_hs_cohort 1 "Cohort 2018", modify label def recent_hs_cohort 2 "Cohort 2019", modify label def recent_hs_cohort 3 "Cohort 2020", modify label def recent_hs_cohort 4 "Cohort 2021", modify
https://www.statalist.org/forums/for...sting-variable
For my circumstances, I need to create a group called Total so that I can graph the results for the full sample. Does anyone have a better way than what I have done? I ask to improve my coding skills. Thanks
Code:
* Part 1: Estimate total students and total students placed in the transfer class. bys recent_hs_cohort: egen tot_stud=total(num_stud) bys recent_hs_cohort: egen tot_placed=total(placed) *Estimate percentage total placed gen pct_tot_place=tot_placed/tot_stud *Disaggregate percent placed by race gen pct_place=placed/num_stud *For presentation, multiply by 100 replace pct_place= pct_place*100 replace pct_tot_place= pct_tot_place*100 *Keep necessary variables keep race recent_hs_cohort pct_place pct_tot_place *Keep necessary years gen year = 2018 if recent_hs_cohort==1 replace year=2021 if recent_hs_cohort==4 *Part 2: Create rows for Total replace race=0 if race==1 & year==. *Label variable la def race 0 "Total" 1"Asian" 2"Black" 3"Filipino" 4"Latino" 5"Indigenous" 6"Pacific Islander" 7"Mixed" 8"White" 9 "Missing", replace la val race race *Assign years to the Total row replace year=2018 if recent_hs_cohort==2 & race==0 replace year=2021 if recent_hs_cohort==3 & race==0 drop if year==. *Find min and max for pct_tot_place egen total_2018=min(pct_tot_place) egen total_2021=max(pct_tot_place) *Assign values replace pct_place= total_2018 if race==0 & year==2018 replace pct_place= total_2021 if race==0 & year==2021 *Part 3: Reshape wide because graph needs wide data drop recent_hs_cohort reshape wide pct_place pct_tot_place, i(race) j(year ) *Part 4: Create graph graph bar pct_place2018 pct_place2021, over(race, lab(angle(45))) /// graphregion(col(white)) ylab(,angle(0)) /// bar(1, fcolor("32 42 68") lw(none)) /// bar(2, fcolor("162 178 200") lw(none)) /// legend(label(1 "Fall 2018 Cohort") label(2 "Fall 2021 Cohort")) /// ytitle("Percentage") /// b1title(Race) /// title("Placed in math Class in 2018 and 2021")
Comment