When programmers from the user community post their commands publicly, it is usually with moderate or even intense enthusiasm and endorsement.
Thanks as ever to Kit Baum, who as usual is blameless in this respect, side_histogram is now downloadable from SSC. Yet I am not sure how much of a service that is. It could be that I end up with some measure of regret for posting this, as I do whenever winsor is used to mangle datasets arbitrarily.
The topic is side-by-side histograms, or at least that seems to be the most common name I've seen. In R circles, dodged seems to be a term of art.
If it is a good idea, you should be able to see easily what is being done. The bin width is 2 mpg, and bars for domestic and foreign cars are placed side by side.
Is that a good idea?
In a way that is similar to what is done with bar charts given categorical predictors, and many readers will be able to work out without pain how to get something similar to this next graph with graph bar.
An immediate stimulus to writing these was seeing several examples -- produced with R -- in Rohan Alexander's book Telling Stories With Data CRC Press. 2023.
The first is just two samples of size 500 from N(5, 1) and N(6, 1):
One comparison is with two histograms superimposed, where we use transparency to make the overlap clear:
Then again what could be more appropriate than a normal quantile plot? I used qplot from the Stata Journal.
Thanks as ever to Kit Baum, who as usual is blameless in this respect, side_histogram is now downloadable from SSC. Yet I am not sure how much of a service that is. It could be that I end up with some measure of regret for posting this, as I do whenever winsor is used to mangle datasets arbitrarily.
The topic is side-by-side histograms, or at least that seems to be the most common name I've seen. In R circles, dodged seems to be a term of art.
Code:
sysuse auto, clear side_histogram mpg, over(foreign) start(10) width(1) freq legend(row(1) pos(12)) name(mpg, replace)
If it is a good idea, you should be able to see easily what is being done. The bin width is 2 mpg, and bars for domestic and foreign cars are placed side by side.
Is that a good idea?
In a way that is similar to what is done with bar charts given categorical predictors, and many readers will be able to work out without pain how to get something similar to this next graph with graph bar.
Code:
side_histogram rep78, over(foreign) discrete width(1) squeeze(0.8) freq legend(row(1) pos(12)) name(rep78, replace)
An immediate stimulus to writing these was seeing several examples -- produced with R -- in Rohan Alexander's book Telling Stories With Data CRC Press. 2023.
The first is just two samples of size 500 from N(5, 1) and N(6, 1):
Code:
* this example stimulated by Alexander (2023, pp.246-247) clear set obs 1000 set seed 314159 gen which = _n >= 500 label def which 1 No 0 Yes label val which which gen Outcome = rnormal(cond(which == 1, 5, 6), 1) side_histogram Outcome, width(0.2) over(which) freq xla(2/9) name(side, replace)
One comparison is with two histograms superimposed, where we use transparency to make the overlap clear:
Code:
twoway histogram Outcome if which == 0, freq /// fcolor(stc1%25) lcolor(stc1*2) start(1.8) width(0.2) xla(2/9) /// || histogram Outcome if which == 1, freq /// fcolor(stc2%25) lcolor(stc2*2) start(1.8) width(0.2) /// legend(order(1 "Yes" 2 "No")) name(super, replace)
Then again what could be more appropriate than a normal quantile plot? I used qplot from the Stata Journal.
Code:
qplot Outcome, over(which) legend(off) trscale(invnormal(@)) /// addplot(scatteri 8.8 2 "Yes", ms(none) mlabsize(large) mlabc(stc1) /// || scatteri 6.6 2 "No", ms(none) mlabsize(large) mlabc(stc2)) yla(2/9) xla(-3/3) /// xtitle(Standard normal deviate) name(qplot, replace)
Comment