Hi Everyone,
I have been in touch with the Stata technical support team and I'm a little worried I'm bumping up against the limits of Stata's graphing capabilities, but I thought before giving up I should see if I could get any guidance on here. I'll explain a little bit about what they said where it is relevant.
I have a dataset of 1000 students across 15 classrooms that I created to roughly match my real data (the data I'm working with is confidential). The data is not cross-classified.
This is how I created it:
clear
set obs 1000
gen item1 = int(uniform() * 4) + 1
replace item1 = . if item1 == 4
gen item2 = int(uniform() * 5) + 1
replace item2 = . if item2 == 4 | item2 == 5
gen item3 = int(uniform() * 6) + 1
replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
gen item4 = int(uniform() * 7) + 1
replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
gen classid = int(uniform() * 15) + 1
-- I know it's a cumbersome way to do it, but I wanted to make sure I had a dataset where the amount of missing data varied across items and respondents (like my real dataset).
So, I have a series of survey items pertaining to students' experience, and for each item I would like to create a stacked horizontal bar graph showing the percentage of students within each classroom that responded to each of the three options of "No," "Maybe," and "Yes."
It is important that I am able to arrange the bars by the percentage of students who responded a certain way in each classroom. For example, let's say the item is "Do you think you'll go to a ski resort this weekend?" I would want to sort each bar (representing one classroom) by the percentage of students who responded "Yes" to this item.
Essentially I want to replicate the below excel-generated graph in Stata:

Here is the code I used to create the graph below:
label define anslbl 1 "Yes" 2 "No" 3 "Maybe"
label values item1 anslbl
graph hbar, over(item1) over(classid) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))
Here is the graph it generates:

I want to have the classrooms descend based on the percentage who responded "Yes" to the item.
The Stata support team said the following when I asked about accomplishing a graph that matches the above excel:
"There isn't an option that will allow you to order the bars based on one value of the stack. However, you may be able to sort the values first in your dataset and then create a new sort variable."
I asked a follow-up question, but have yet to hear back. I also find the support to be hit-or-miss.
In the meantime, I decided to try and create a variable that is equivalent to the percentage of students within each class who responded a certain way to an item.
This is the code I tried:
bysort classid: gen class_count = _N
foreach x of varlist item1 item2 item3 item4 {
tab `x', missing gen(`x'resp)
}
foreach i of numlist 4 1 2 3 {
foreach x of varlist item1 item2 item3 item4 {
bysort classid: gen `x'count_resp`i' = sum(`x'resp`i')
bysort classid: replace `x'count_resp`i' = `x'count_resp`i'[_N]
bysort classid: gen `x'count_resp`i'pct = ((`x'count_resp`i')/(class_count - `x'count_resp4))
}
}
I ran tab `x', missing gen(`x'resp) so that I could subtract the missing values for a given variable from the total class count when computing the percentage so that it matches the percentages that are computed in the Stata graph. I checked the above code and it seems to give me what I want. I am now having trouble getting my graph to sort by one of these variables.
When I try this:
graph hbar, over(item1) over(classid, sort(item1count_resp3pct)) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))
I get the following error:

I was trying to adapt the syntax from the section of "bar graph" help labeled "Putting the bars in a prespecified order."
Any guidance would be much appreciated. Apologies if I'm missing something obvious.
Thanks,
Jake
I have been in touch with the Stata technical support team and I'm a little worried I'm bumping up against the limits of Stata's graphing capabilities, but I thought before giving up I should see if I could get any guidance on here. I'll explain a little bit about what they said where it is relevant.
I have a dataset of 1000 students across 15 classrooms that I created to roughly match my real data (the data I'm working with is confidential). The data is not cross-classified.
This is how I created it:
clear
set obs 1000
gen item1 = int(uniform() * 4) + 1
replace item1 = . if item1 == 4
gen item2 = int(uniform() * 5) + 1
replace item2 = . if item2 == 4 | item2 == 5
gen item3 = int(uniform() * 6) + 1
replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
gen item4 = int(uniform() * 7) + 1
replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
gen classid = int(uniform() * 15) + 1
-- I know it's a cumbersome way to do it, but I wanted to make sure I had a dataset where the amount of missing data varied across items and respondents (like my real dataset).
So, I have a series of survey items pertaining to students' experience, and for each item I would like to create a stacked horizontal bar graph showing the percentage of students within each classroom that responded to each of the three options of "No," "Maybe," and "Yes."
It is important that I am able to arrange the bars by the percentage of students who responded a certain way in each classroom. For example, let's say the item is "Do you think you'll go to a ski resort this weekend?" I would want to sort each bar (representing one classroom) by the percentage of students who responded "Yes" to this item.
Essentially I want to replicate the below excel-generated graph in Stata:
Here is the code I used to create the graph below:
label define anslbl 1 "Yes" 2 "No" 3 "Maybe"
label values item1 anslbl
graph hbar, over(item1) over(classid) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))
Here is the graph it generates:
I want to have the classrooms descend based on the percentage who responded "Yes" to the item.
The Stata support team said the following when I asked about accomplishing a graph that matches the above excel:
"There isn't an option that will allow you to order the bars based on one value of the stack. However, you may be able to sort the values first in your dataset and then create a new sort variable."
I asked a follow-up question, but have yet to hear back. I also find the support to be hit-or-miss.
In the meantime, I decided to try and create a variable that is equivalent to the percentage of students within each class who responded a certain way to an item.
This is the code I tried:
bysort classid: gen class_count = _N
foreach x of varlist item1 item2 item3 item4 {
tab `x', missing gen(`x'resp)
}
foreach i of numlist 4 1 2 3 {
foreach x of varlist item1 item2 item3 item4 {
bysort classid: gen `x'count_resp`i' = sum(`x'resp`i')
bysort classid: replace `x'count_resp`i' = `x'count_resp`i'[_N]
bysort classid: gen `x'count_resp`i'pct = ((`x'count_resp`i')/(class_count - `x'count_resp4))
}
}
I ran tab `x', missing gen(`x'resp) so that I could subtract the missing values for a given variable from the total class count when computing the percentage so that it matches the percentages that are computed in the Stata graph. I checked the above code and it seems to give me what I want. I am now having trouble getting my graph to sort by one of these variables.
When I try this:
graph hbar, over(item1) over(classid, sort(item1count_resp3pct)) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))
I get the following error:
I was trying to adapt the syntax from the section of "bar graph" help labeled "Putting the bars in a prespecified order."
Any guidance would be much appreciated. Apologies if I'm missing something obvious.
Thanks,
Jake
Comment