side_histogram available from SSC

Nick Cox

Join Date: Mar 2014

Posts: 34754
#1

side_histogram available from SSC

03 Sep 2024, 15:08

When programmers from the user community post their commands publicly, it is usually with moderate or even intense enthusiasm and endorsement.

Thanks as ever to Kit Baum, who as usual is blameless in this respect, side_histogram is now downloadable from SSC. Yet I am not sure how much of a service that is. It could be that I end up with some measure of regret for posting this, as I do whenever winsor is used to mangle datasets arbitrarily.

The topic is side-by-side histograms, or at least that seems to be the most common name I've seen. In R circles, dodged seems to be a term of art.

Code:

sysuse auto, clear side_histogram mpg, over(foreign) start(10) width(1) freq legend(row(1) pos(12)) name(mpg, replace)

If it is a good idea, you should be able to see easily what is being done. The bin width is 2 mpg, and bars for domestic and foreign cars are placed side by side.

Is that a good idea?

In a way that is similar to what is done with bar charts given categorical predictors, and many readers will be able to work out without pain how to get something similar to this next graph with graph bar.

Code:

side_histogram rep78, over(foreign) discrete width(1) squeeze(0.8) freq legend(row(1) pos(12)) name(rep78, replace)

An immediate stimulus to writing these was seeing several examples -- produced with R -- in Rohan Alexander's book Telling Stories With Data CRC Press. 2023.

The first is just two samples of size 500 from N(5, 1) and N(6, 1):

Code:

* this example stimulated by Alexander (2023, pp.246-247) clear set obs 1000 set seed 314159 gen which = _n >= 500 label def which 1 No 0 Yes label val which which gen Outcome = rnormal(cond(which == 1, 5, 6), 1) side_histogram Outcome, width(0.2) over(which) freq xla(2/9) name(side, replace)

One comparison is with two histograms superimposed, where we use transparency to make the overlap clear:

Code:

twoway histogram Outcome if which == 0, freq /// fcolor(stc1%25) lcolor(stc1*2) start(1.8) width(0.2) xla(2/9) /// || histogram Outcome if which == 1, freq /// fcolor(stc2%25) lcolor(stc2*2) start(1.8) width(0.2) /// legend(order(1 "Yes" 2 "No")) name(super, replace)

Then again what could be more appropriate than a normal quantile plot? I used qplot from the Stata Journal.

Code:

qplot Outcome, over(which) legend(off) trscale(invnormal(@)) /// addplot(scatteri 8.8 2 "Yes", ms(none) mlabsize(large) mlabc(stc1) /// || scatteri 6.6 2 "No", ms(none) mlabsize(large) mlabc(stc2)) yla(2/9) xla(-3/3) /// xtitle(Standard normal deviate) name(qplot, replace)
Tags: None

5 likes
Nick Cox

Join Date: Mar 2014

Posts: 34754
#2

26 Sep 2024, 11:24

Thanks again to Kit Baum, this command has now been updated to cover what -- on reflection -- is a more obvious application, to two variables measured on the same scale.

The new worked example in the help compares binomial and Poisson samples for distributions whose means should be identical.
1 like
Comment
ericmelse

Join Date: May 2014

Posts: 418
#3

27 Sep 2024, 08:03

Dear Nick, using:

Code:

which side_histogram

will not result in a version statement for side_histogram in the Stata results window.
Although possibly superficial, I think users appreciate your version notes in your ado file(s). Certainly I do!

http://publicationslist.org/eric.melse
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 34754
#4

27 Sep 2024, 08:18

ericmelse Thanks for spotting that. The first line of code should read

Code:

*! 1.1.0 NJC 21 September 2024

so that it is echoed by which.
1 like
Comment

Chen Samulsion

Join Date: Jan 2018
Posts: 659

30 Oct 2024, 21:44

Austin Nichols wrote a command -byhist- which can do the same work as -side_histogram-.

Code:

package byhist from http://fmwww.bc.edu/RePEc/bocode/b
--------------------------------------------------------------------------------------------------------------------------

TITLE
      'BYHIST': module to produce interlaced histograms

DESCRIPTION/AUTHOR(S)
      
      byhist makes "interlaced" histograms, with frequencies
      (optionally fraction/density) of one variable optionally shown by
      categories of another variable. fweights, aweights, and pweights
      are allowed (at user's own risk).
      
      KW: histogram
      KW: graphics
      KW: frequencies
      KW: fraction
      KW: density
      
      Requires: Stata version 8.2
      
      Distribution-Date: 20100420
      
      Author: Austin Nichols
      Support: email [email protected]
      

INSTALLATION FILES                                  (click here to install)
      byhist.ado
      byhist.hlp

Comment

Nick Cox

Join Date: Mar 2014

Posts: 34754
#6

31 Oct 2024, 02:00

Chen Samulsion Correct. I should have remembered that. Austin's command seems to tackle the problem in #1 but not that in #2.
1 like
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 659
#7

31 Oct 2024, 03:11

Dear Nick Cox, thank you for your explanation. And I have one question. In your example code in #1, you used stc1 and stc2 in fcolor option. I have searched helpfile of colorstyle and found no specification about these two colors. Could you tell me more about them? Thank you very much.

Code:

twoway histogram Outcome if which == 0, freq /// fcolor(stc1%25) lcolor(stc1*2) start(1.8) width(0.2) xla(2/9) /// || histogram Outcome if which == 1, freq /// fcolor(stc2%25) lcolor(stc2*2) start(1.8) width(0.2) /// legend(order(1 "Yes" 2 "No")) name(super, replace)
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 659
#8

31 Oct 2024, 03:23

Well, I searched in web and found that it is documented in Stata manuals/g-4schemest which I have never noticed:

the histogram fill color is stc1 with a 90% intensity while the outline color is stc1 with a 70% intensity.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 34754
#9

31 Oct 2024, 04:49

I am using Stata 18 following a longstanding convention here that people may -- indeed should -- assume that a writer is using the current version of Stata unless they specify otherwise.

The scheme stcolor and colours stc1 stc2 and so forth were introduced in Stata 18.

* and % to modify default colours have different effects and both predate Stata 18.

I hope that helps to explain.
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 659
#10

31 Oct 2024, 07:45

Thank you Nick. I see that Stata 18 introduced some new colorstyles https://www.stata.com/manuals/g-4colorstyle.pdf. I have thought that you used a never heard colorstyle from user-written scheme. This remind me to follow the What's New in Stata.
Comment
George Ford

Join Date: Aug 2014

Posts: 2782
#11

01 Nov 2024, 14:00

This is useful. I normally use kdensity because overlapping histogram are so ugly (absent a lot of manipulation). This does a much cleaner job.
Comment

Announcement

side_histogram available from SSC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment