Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • side_histogram available from SSC

    When programmers from the user community post their commands publicly, it is usually with moderate or even intense enthusiasm and endorsement.

    Thanks as ever to Kit Baum, who as usual is blameless in this respect, side_histogram is now downloadable from SSC. Yet I am not sure how much of a service that is. It could be that I end up with some measure of regret for posting this, as I do whenever winsor is used to mangle datasets arbitrarily.

    The topic is side-by-side histograms, or at least that seems to be the most common name I've seen. In R circles, dodged seems to be a term of art.

    Code:
    sysuse auto, clear
    
    side_histogram mpg, over(foreign) start(10) width(1) freq legend(row(1) pos(12)) name(mpg, replace)
    Click image for larger version

Name:	sh_mpg.png
Views:	1
Size:	28.3 KB
ID:	1762974



    If it is a good idea, you should be able to see easily what is being done. The bin width is 2 mpg, and bars for domestic and foreign cars are placed side by side.

    Is that a good idea?

    In a way that is similar to what is done with bar charts given categorical predictors, and many readers will be able to work out without pain how to get something similar to this next graph with graph bar.

    Code:
    side_histogram rep78, over(foreign) discrete width(1) squeeze(0.8) freq legend(row(1) pos(12)) name(rep78, replace)
    Click image for larger version

Name:	sh_rep78.png
Views:	1
Size:	26.1 KB
ID:	1762975



    An immediate stimulus to writing these was seeing several examples -- produced with R -- in Rohan Alexander's book Telling Stories With Data CRC Press. 2023.

    The first is just two samples of size 500 from N(5, 1) and N(6, 1):


    Code:
    * this example stimulated by Alexander (2023, pp.246-247)
    clear
    set obs 1000
    set seed 314159
    
    gen which = _n >= 500
    label def which 1 No 0 Yes
    label val which which
    
    gen Outcome = rnormal(cond(which == 1, 5, 6), 1)
    
    side_histogram Outcome, width(0.2) over(which) freq xla(2/9) name(side, replace)
    Click image for larger version

Name:	sh_side.png
Views:	1
Size:	35.3 KB
ID:	1762976



    One comparison is with two histograms superimposed, where we use transparency to make the overlap clear:

    Code:
    twoway histogram Outcome if which == 0, freq ///
    fcolor(stc1%25) lcolor(stc1*2) start(1.8) width(0.2) xla(2/9) ///
    || histogram Outcome if which == 1, freq ///
    fcolor(stc2%25) lcolor(stc2*2) start(1.8) width(0.2) ///
    legend(order(1 "Yes" 2 "No")) name(super, replace)
    Click image for larger version

Name:	sh_super.png
Views:	1
Size:	36.7 KB
ID:	1762977



    Then again what could be more appropriate than a normal quantile plot? I used qplot from the Stata Journal.

    Code:
    qplot Outcome, over(which) legend(off) trscale(invnormal(@)) ///
    addplot(scatteri 8.8 2 "Yes", ms(none) mlabsize(large) mlabc(stc1) ///
    || scatteri 6.6 2 "No", ms(none) mlabsize(large) mlabc(stc2)) yla(2/9) xla(-3/3) ///
    xtitle(Standard normal deviate) name(qplot, replace)
    Click image for larger version

Name:	sh_qplot.png
Views:	1
Size:	49.7 KB
ID:	1762978


  • #2
    Thanks again to Kit Baum, this command has now been updated to cover what -- on reflection -- is a more obvious application, to two variables measured on the same scale.

    The new worked example in the help compares binomial and Poisson samples for distributions whose means should be identical.

    Comment


    • #3
      Dear Nick, using:
      Code:
      which side_histogram
      will not result in a version statement for side_histogram in the Stata results window.
      Although possibly superficial, I think users appreciate your version notes in your ado file(s). Certainly I do!
      http://publicationslist.org/eric.melse

      Comment


      • #4
        ericmelse Thanks for spotting that. The first line of code should read

        Code:
        *! 1.1.0 NJC 21 September 2024
        so that it is echoed by which.

        Comment


        • #5
          Austin Nichols wrote a command -byhist- which can do the same work as -side_histogram-.
          Code:
          package byhist from http://fmwww.bc.edu/RePEc/bocode/b
          --------------------------------------------------------------------------------------------------------------------------
          
          TITLE
                'BYHIST': module to produce interlaced histograms
          
          DESCRIPTION/AUTHOR(S)
                
                byhist makes "interlaced" histograms, with frequencies
                (optionally fraction/density) of one variable optionally shown by
                categories of another variable. fweights, aweights, and pweights
                are allowed (at user's own risk).
                
                KW: histogram
                KW: graphics
                KW: frequencies
                KW: fraction
                KW: density
                
                Requires: Stata version 8.2
                
                Distribution-Date: 20100420
                
                Author: Austin Nichols
                Support: email [email protected]
                
          
          INSTALLATION FILES                                  (click here to install)
                byhist.ado
                byhist.hlp

          Comment


          • #6
            Chen Samulsion Correct. I should have remembered that. Austin's command seems to tackle the problem in #1 but not that in #2.

            Comment


            • #7
              Dear Nick Cox, thank you for your explanation. And I have one question. In your example code in #1, you used stc1 and stc2 in fcolor option. I have searched helpfile of colorstyle and found no specification about these two colors. Could you tell me more about them? Thank you very much.
              Code:
              twoway histogram Outcome if which == 0, freq ///
              fcolor(stc1%25) lcolor(stc1*2) start(1.8) width(0.2) xla(2/9) ///
              || histogram Outcome if which == 1, freq ///
              fcolor(stc2%25) lcolor(stc2*2) start(1.8) width(0.2) ///
              legend(order(1 "Yes" 2 "No")) name(super, replace)

              Comment


              • #8
                Well, I searched in web and found that it is documented in Stata manuals/g-4schemest which I have never noticed:
                the histogram fill color is stc1 with a 90% intensity while the outline color is stc1 with a 70% intensity.

                Comment


                • #9
                  I am using Stata 18 following a longstanding convention here that people may -- indeed should -- assume that a writer is using the current version of Stata unless they specify otherwise.

                  The scheme stcolor and colours stc1 stc2 and so forth were introduced in Stata 18.

                  * and % to modify default colours have different effects and both predate Stata 18.

                  I hope that helps to explain.

                  Comment


                  • #10
                    Thank you Nick. I see that Stata 18 introduced some new colorstyles https://www.stata.com/manuals/g-4colorstyle.pdf. I have thought that you used a never heard colorstyle from user-written scheme. This remind me to follow the What's New in Stata.
                    Click image for larger version

Name:	_20241031203933.png
Views:	1
Size:	88.4 KB
ID:	1766732

                    Comment


                    • #11
                      This is useful. I normally use kdensity because overlapping histogram are so ugly (absent a lot of manipulation). This does a much cleaner job.

                      Comment

                      Working...
                      X