reordering bars in hbar graph in descending order by values of a variable

Jacob Rowley

Join Date: Jan 2017

Posts: 12
#1

reordering bars in hbar graph in descending order by values of a variable

10 Jan 2017, 18:51

Hi Everyone,

I have been in touch with the Stata technical support team and I'm a little worried I'm bumping up against the limits of Stata's graphing capabilities, but I thought before giving up I should see if I could get any guidance on here. I'll explain a little bit about what they said where it is relevant.

I have a dataset of 1000 students across 15 classrooms that I created to roughly match my real data (the data I'm working with is confidential). The data is not cross-classified.

This is how I created it:

clear
set obs 1000
gen item1 = int(uniform() * 4) + 1
replace item1 = . if item1 == 4
gen item2 = int(uniform() * 5) + 1
replace item2 = . if item2 == 4 | item2 == 5
gen item3 = int(uniform() * 6) + 1
replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
gen item4 = int(uniform() * 7) + 1
replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
gen classid = int(uniform() * 15) + 1

-- I know it's a cumbersome way to do it, but I wanted to make sure I had a dataset where the amount of missing data varied across items and respondents (like my real dataset).

So, I have a series of survey items pertaining to students' experience, and for each item I would like to create a stacked horizontal bar graph showing the percentage of students within each classroom that responded to each of the three options of "No," "Maybe," and "Yes."

It is important that I am able to arrange the bars by the percentage of students who responded a certain way in each classroom. For example, let's say the item is "Do you think you'll go to a ski resort this weekend?" I would want to sort each bar (representing one classroom) by the percentage of students who responded "Yes" to this item.

Essentially I want to replicate the below excel-generated graph in Stata:

Here is the code I used to create the graph below:

label define anslbl 1 "Yes" 2 "No" 3 "Maybe"

label values item1 anslbl

graph hbar, over(item1) over(classid) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))

Here is the graph it generates:

I want to have the classrooms descend based on the percentage who responded "Yes" to the item.

The Stata support team said the following when I asked about accomplishing a graph that matches the above excel:
"There isn't an option that will allow you to order the bars based on one value of the stack. However, you may be able to sort the values first in your dataset and then create a new sort variable."

I asked a follow-up question, but have yet to hear back. I also find the support to be hit-or-miss.

In the meantime, I decided to try and create a variable that is equivalent to the percentage of students within each class who responded a certain way to an item.

This is the code I tried:

bysort classid: gen class_count = _N

foreach x of varlist item1 item2 item3 item4 {
tab `x', missing gen(`x'resp)
}

foreach i of numlist 4 1 2 3 {
foreach x of varlist item1 item2 item3 item4 {
bysort classid: gen `x'count_resp`i' = sum(`x'resp`i')
bysort classid: replace `x'count_resp`i' = `x'count_resp`i'[_N]
bysort classid: gen `x'count_resp`i'pct = ((`x'count_resp`i')/(class_count - `x'count_resp4))
}
}

I ran tab `x', missing gen(`x'resp) so that I could subtract the missing values for a given variable from the total class count when computing the percentage so that it matches the percentages that are computed in the Stata graph. I checked the above code and it seems to give me what I want. I am now having trouble getting my graph to sort by one of these variables.

When I try this:

graph hbar, over(item1) over(classid, sort(item1count_resp3pct)) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))

I get the following error:

I was trying to adapt the syntax from the section of "bar graph" help labeled "Putting the bars in a prespecified order."

Any guidance would be much appreciated. Apologies if I'm missing something obvious.

Thanks,
Jake

Attached Files
Tags: None

Maarten Buis

Join Date: Mar 2014
Posts: 3456

11 Jan 2017, 01:41

You can install the egenmore package from SSC (type in Stata ssc install egenmore). This will contain the egen function axis(), which is exactly designed for this purpose. Here is an adaptation of your example (thanks for the example, that makes it much easier to answer):

Code:

clear
set obs 1000
gen item1 = int(uniform() * 4) + 1
replace item1 = . if item1 == 4
gen item2 = int(uniform() * 5) + 1
replace item2 = . if item2 == 4 | item2 == 5
gen item3 = int(uniform() * 6) + 1
replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
gen item4 = int(uniform() * 7) + 1
replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
gen classid = int(uniform() * 15) + 1

gen item1yes = (item1 == 1) if item1 < .
bys classid : egen item1propyes = mean(item1yes)
egen Classid = axis(item1propyes classid), label(classid)

graph hbar, over(item1) over(Classid) asyvars stack percentages ///
blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
bar(2, color(red)) bar(3, color(green))

Click image for larger version

Name: Graph.png
Views: 1
Size: 20.6 KB
ID: 1369964

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#3

11 Jan 2017, 02:40

Maarten kindly drew attention to the axis() function in egenmore (SSC). I tend to emphasise an equivalent solution, creating a variable containing desired category order and then labelling it using labmask (SJ). That's written up within http://www.stata-journal.com/sjpdf.h...iclenum=gr0034

I have some extra comments beyond the question.

1. Mixing red and green is a standard graphical no-no, as so many people have difficulty distinguishing those colours. There is much good advice on colour out there, e.g. http://colorbrewer2.org/#type=qualit...heme=Dark2&n=3 yields suggestions of a 3-colour scheme for qualitatively different categories.

2. Putting the numerical values on the bars when the bars have strong colours makes them hard to read. This is one of several weaknesses of a stacked design. Putting the bars side-by-side is one alternative. See http://www.statalist.org/forums/foru...updated-on-ssc and http://www.stata-journal.com/article...article=gr0066 for the write-up.

Here's a combined example. My random numbers will differ from those in the original, but note the technique of setting the seed.

Code:

clear set seed 2803 set obs 1000 gen item1 = int(uniform() * 4) + 1 replace item1 = . if item1 == 4 gen item2 = int(uniform() * 5) + 1 replace item2 = . if item2 == 4 | item2 == 5 gen item3 = int(uniform() * 6) + 1 replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6 gen item4 = int(uniform() * 7) + 1 replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7 gen classid = int(uniform() * 15) + 1 gen item1yes = (item1 == 1) if item1 < . bys classid : egen item1propyes = mean(item1yes) egen Classid = group(item1propyes classid) labmask Classid, values(classid) tabplot Classid item1, percent(Classid) showval(format(%2.0f) offset(0.5)) /// separate(item1) bcolor("27 151 119" "217 95 2" "117 112 179") horiz /// ytitle(group) subtitle(percent)
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#4

11 Jan 2017, 05:00

I just want to flag the most general point: getting any order you like for bars at worst means that you have to create your own variable defining that order.
Comment
Jacob Rowley

Join Date: Jan 2017

Posts: 12
#5

11 Jan 2017, 11:35

Thank you, Maarten & Nick for your helpful input!

Nick, would it be correct that in my initial run I had created a variable from which I could create the bar order I wanted (`x'count_resp`i'pct), but I fell one step short by trying to sort by this variable instead of creating a sort variable?

Thanks for the color tip. I was essentially just trying to mimic the example graph, but that is a really good point. I've changed the colors and have gotten rid of the bar labels. I agree that they are a bit tricky to read and it looks better without them.

Is there a way to have each graph titled with a variable label? I have been working from the title portion of the Stata help graph page, but have yet to figure it out. You'll see what I'm trying to do below.

Also, is there a way to stack the bars in a different order such that "Yes" appears farthest to the left without changing the values of the variables themselves (I was thinking that maybe I could reverse code things so that 1 is "Yes" and so on, but I would prefer not to)? I think in order to be comfortable with no bar labels it would be helpful to put the response category ("Yes") sorted by on the left so that the percentages on the axis match the bars.

Also, I adjusted the numeric axis label to go by 10 percentage points. Is there a way in that statement, currently (0(10)100), to have the numbers display as percentages? If not I guess the axis label will have to suffice.

This is where I am not now -- Maarten, I've adapted your very helpful code:

Code:

clear set obs 1000 gen item1 = int(uniform() * 4) + 1 replace item1 = . if item1 == 4 gen item2 = int(uniform() * 5) + 1 replace item2 = . if item2 == 4 | item2 == 5 gen item3 = int(uniform() * 6) + 1 replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6 gen item4 = int(uniform() * 7) + 1 replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7 gen classid = int(uniform() * 15) + 1 label var item1 "Did you plan on going to a ski-resort this weekend?" label var item2 "Do you want to be an astronaut when you grow up?" foreach x of varlist item1 item2 { label define anslbl`x' 3 "Yes" 2 "Maybe" 1 "No" label values `x' anslbl`x' gen `x'yes = (`x' == 3) if `x' < . bys classid: egen `x'propyes = mean(`x'yes) egen Classid`x' = axis(`x'propyes classid), label(classid) graph hbar, over(`x') over(Classid`x', label(nolabels)) asyvars stack percentages /// bar(1, color(green)) bar(2, color(blue)) bar(3, color(yellow)) ylabel(0(10)100) /// ytitle("Student Response Percentage") title(`x') saving(`x'graph, replace) }

These are the graphs I get:

I feel that I'm super close to getting what I want thanks to your help, Maarten and Nick!

Best,
Jake
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3456

12 Jan 2017, 01:50

You can access the variable labels with extended macro functions, see help extended_fcn. I never remember the name of that help-file, so I always type help macro and click on the link for extended macro functions.

Code:

clear
set obs 1000
set seed 123456
gen item1 = int(uniform() * 4) + 1
replace item1 = . if item1 == 4
gen item2 = int(uniform() * 5) + 1
replace item2 = . if item2 == 4 | item2 == 5
gen item3 = int(uniform() * 6) + 1
replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
gen item4 = int(uniform() * 7) + 1
replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
gen classid = int(uniform() * 15) + 1

label var item1 "Did you plan on going to a ski-resort this weekend?"
label var item2 "Do you want to be an astronaut when you grow up?"

foreach x of varlist item1 item2 {
    label define anslbl`x'  3 "Yes" 2 "Maybe" 1 "No"
    label values `x' anslbl`x'
    gen `x'yes = (`x' == 3) if `x' < .
    bys classid: egen `x'propyes = mean(`x'yes)
    egen Classid`x' = axis(`x'propyes classid), label(classid)

    graph hbar, over(`x') over(Classid`x')             ///
                asyvars stack percentages              ///
                bar(1, color(green ))                  ///
                bar(2, color(blue  ))                  ///
                bar(3, color(yellow))                  ///
                ylabel(0(10)100)                       ///
                ytitle("Student Response Percentage")  ///
                l1title("Class")                        ///
                title(`"`: variable label `x''"')      ///
                name(`x'graph, replace)
}

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#7

12 Jan 2017, 02:03

I'd add that variable labels tend to get used by default as axis labels in tabplot
Comment
Jacob Rowley

Join Date: Jan 2017

Posts: 12
#8

12 Jan 2017, 15:55

Thanks, Maarten!

Nick, I didn't previously have tabplot so I installed it and will play with it. Thanks!

Best,
Jake
Comment

Announcement