Thanks as ever to Kit Baum, a new package framed_bar for framed bar charts is now downloadable from SSC. Stata 12 is needed.
What is a framed bar chart? That term may be unfamiliar, but I will guess that the idea is something you've seen before or at least can absorb quickly.
Let me back up and show the idea without using the new command. I will use twoway for a reason that will become clear shortly.
Given a (0, 1) indicator variable, its mean is just the proportion of successes, in one jargon, or the proportion of observations in the state coded 1. So the mean of 0 0 0 1 1 1 1 1 1 1 is just 7/10 = 0.7, the proportion of successes and 1 minus that is the proportion of failures (in the same jargon), the state coded 0.
Wanting to see percents instead is a common and easy variation.
So, for the auto data, we can get the percents foreign as means and then fire up a bar chart.
data:image/s3,"s3://crabby-images/74b33/74b3313978f6bc8a6b7115809140f8bcea76a1c6" alt="Click image for larger version
Name: FB_S1.png
Views: 1
Size: 24.1 KB
ID: 1764477"
That kind of graph raises a simple thought. Zero is a natural reference and is naturally the base of each bar (including bars of zero height, which exist but are hard to see). The other natural reference is one or 100% which is the frame we're talking about. It's just another bar laid down first with here no fill colour. (If you prefer another colour to none, I recommend a very pale colour.) Now we have space and perhaps inclination to add annotation.
data:image/s3,"s3://crabby-images/ad757/ad7577c360a81f0b2e1fa79fcfe1f5dfbee829dd" alt=""
Although frames as a graphic device are at least a century old, my understanding owes most to the work of William S. Cleveland. Encouragement to use frames
in tabplot from the Stata Journal arose from comments from William Huber and Jeff Laux. The initial stimulus for this command was in a thread here by Jonathan Afilalo. Encouraging comments came from Tim Morris.
What I guess many people would do here is a stacked bar chart, which is widely familiar, but (I suggest) not obviously better. This stacked
version could be vertical if you prefer.
data:image/s3,"s3://crabby-images/1bf2e/1bf2e6cdba5ae6cb2efbae88191f2473a4475b35" alt="Click image for larger version
Name: FB_S3.png
Views: 1
Size: 23.2 KB
ID: 1764478"
Here is some code for those three graphs. NB catplot is from SSC, as just very recently updated.
Why did I just use twoway and not say graph bar (mean) or graph hbar (mean)? The reason is that I don't know a trick to get frames with either of the last two commands. (Someone may be able to fill in that gap.)
Now if you like the idea and want to use it routinely, framed_bar offers to take over much of the nitty-gritty.
Here's the sales pitch.
framed_bar draws framed bar charts showing summary statistics for one or more numeric variables. Frames comparing bars with limiting values can work to
simplify, by making an obvious complement tacit rather than explicit, and to clarify, by making near extremes evident.
Recall that when you are drinking from a glass, your glass can be easily judged as (nearly) full, (nearly) empty, or in between. The glass analogy works
better with vertical bars.
It is best explained statistically by a leading application. Suppose we have a binary (Boolean, dichotomous, dummy, indicator, logical, one-hot, quantal,
zero-one) variable conventionally coded 0 or 1. A common jargon is that the state coded 0 is dubbed failure and the state coded 1 is dubbed success;
sometimes these terms are evocative and otherwise they are just terms of art.
A common graphic for such data is a stacked bar chart showing the proportions of whatever is coded 0 and whatever is coded 1. However, missing values aside,
those two proportions necessarily add to 1. Hence an alternative graphic is just a bar chart showing the proportion of successes within a frame of height or
length 1. The empty space corresponds to the proportion of failures. Equivalently, such a bar chart plots the mean or means of one or more sets of binary
values, as a mean of such a (0, 1) variable is just the proportion of successes.
Variations on this design are offered by default or may be achieved by options. Headline possibilities include
Any single summary statistic offered by collapse may be chosen rather than means.
Annotation by default shows the statistics (say means) concerned as a numeric text display.
Annotation may be shown optionally of sample sizes as a numeric text display.
Values shown may on the fly be replaced according to a specified calculation.
Frames may be suppressed. A frame may be irrelevant to your data or your purpose, but you may like the design otherwise.
Here are two more examples, first showing the number of missing values in some variables (by subtraction from sample size). Here we switch the frame off.
data:image/s3,"s3://crabby-images/c3f57/c3f57db94c9bfe09a7c1a055cb4dd36a6d809f64" alt="Click image for larger version
Name: FB_S4.png
Views: 1
Size: 31.8 KB
ID: 1764479"
Now plot the proportion of college graduates by race and union worker:
data:image/s3,"s3://crabby-images/0d668/0d668ceb66c7449a161a0020d89c67482f781968" alt="Click image for larger version
Name: FB_S5.png
Views: 1
Size: 33.8 KB
ID: 1764480"
There are several references in the help file to use of frames in this sense, but more would be most welcome.
What is a framed bar chart? That term may be unfamiliar, but I will guess that the idea is something you've seen before or at least can absorb quickly.
Let me back up and show the idea without using the new command. I will use twoway for a reason that will become clear shortly.
Given a (0, 1) indicator variable, its mean is just the proportion of successes, in one jargon, or the proportion of observations in the state coded 1. So the mean of 0 0 0 1 1 1 1 1 1 1 is just 7/10 = 0.7, the proportion of successes and 1 minus that is the proportion of failures (in the same jargon), the state coded 0.
Wanting to see percents instead is a common and easy variation.
So, for the auto data, we can get the percents foreign as means and then fire up a bar chart.
That kind of graph raises a simple thought. Zero is a natural reference and is naturally the base of each bar (including bars of zero height, which exist but are hard to see). The other natural reference is one or 100% which is the frame we're talking about. It's just another bar laid down first with here no fill colour. (If you prefer another colour to none, I recommend a very pale colour.) Now we have space and perhaps inclination to add annotation.
Although frames as a graphic device are at least a century old, my understanding owes most to the work of William S. Cleveland. Encouragement to use frames
in tabplot from the Stata Journal arose from comments from William Huber and Jeff Laux. The initial stimulus for this command was in a thread here by Jonathan Afilalo. Encouraging comments came from Tim Morris.
What I guess many people would do here is a stacked bar chart, which is widely familiar, but (I suggest) not obviously better. This stacked
version could be vertical if you prefer.
Here is some code for those three graphs. NB catplot is from SSC, as just very recently updated.
Code:
sysuse auto, clear set scheme stcolor egen pc_foreign = mean(100 * foreign), by(rep78) egen tag = tag(rep78) twoway bar pc_foreign rep78 if tag, ytitle(% foreign) barw(0.8) name(G1, replace) gen frame = 100 twoway bar frame rep78 if tag, pstyle(p2) barw(0.8) fcolor(none) || bar pc_foreign rep78 if tag, ytitle(% foreign) barw(0.8) pstyle(p1) legend(off) || scatter pc_foreign rep78 if tag, ms(none) mla(pc_foreign) mlabpos(12) mlabc(black) mlabformat(%1.0f) mlabsize(large) name(G2, replace) catplot , over(foreign) over(rep78) asyvars stack percent(rep78) l1title(Repair record 1978) name(G3, replace)
Now if you like the idea and want to use it routinely, framed_bar offers to take over much of the nitty-gritty.
Here's the sales pitch.
framed_bar draws framed bar charts showing summary statistics for one or more numeric variables. Frames comparing bars with limiting values can work to
simplify, by making an obvious complement tacit rather than explicit, and to clarify, by making near extremes evident.
Recall that when you are drinking from a glass, your glass can be easily judged as (nearly) full, (nearly) empty, or in between. The glass analogy works
better with vertical bars.
It is best explained statistically by a leading application. Suppose we have a binary (Boolean, dichotomous, dummy, indicator, logical, one-hot, quantal,
zero-one) variable conventionally coded 0 or 1. A common jargon is that the state coded 0 is dubbed failure and the state coded 1 is dubbed success;
sometimes these terms are evocative and otherwise they are just terms of art.
A common graphic for such data is a stacked bar chart showing the proportions of whatever is coded 0 and whatever is coded 1. However, missing values aside,
those two proportions necessarily add to 1. Hence an alternative graphic is just a bar chart showing the proportion of successes within a frame of height or
length 1. The empty space corresponds to the proportion of failures. Equivalently, such a bar chart plots the mean or means of one or more sets of binary
values, as a mean of such a (0, 1) variable is just the proportion of successes.
Variations on this design are offered by default or may be achieved by options. Headline possibilities include
Any single summary statistic offered by collapse may be chosen rather than means.
Annotation by default shows the statistics (say means) concerned as a numeric text display.
Annotation may be shown optionally of sample sizes as a numeric text display.
Values shown may on the fly be replaced according to a specified calculation.
Frames may be suppressed. A frame may be irrelevant to your data or your purpose, but you may like the design otherwise.
Here are two more examples, first showing the number of missing values in some variables (by subtraction from sample size). Here we switch the frame off.
Code:
. sysuse nlsw88, clear . framed_bar grade industry occupation union hours tenure, stat(count) calc(`=_N' - @) allobs sort frame(0) horizontal barlabel(mlabc(black) mlabs(medlarge) mlabf(%1.0f)) xsc(off) subtitle(Missing values)
Now plot the proportion of college graduates by race and union worker:
Code:
. framed_bar collgrad, over(union) by(race, row(1)) barlabel(mlabf(%04.3f)) ytitle(Proportion of college graduates) yla(0 "0" 1 "1" 0.2(0.2)0.8, format(%02.1f))
There are several references in the help file to use of frames in this sense, but more would be most welcome.
Comment