Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labeling edges of box (and whisker) plots: 25% median, 75%. How?

    I am analyzing an ordinate set of text level reading scores. I am using the median and the interquartile range (IQR) for three administrations of the assessment: fall, winter, and spring. I want to accompany the data with box-and-whiskers plots (graph box) and want to indicate what score marks the 25th percentile, the median, and the 75th percentile. The analysis (for 76 elementary schools) is for administrators and teachers. I am flummoxed at labeling the left and right corners of the box and the median. I've tried following the options and using the graph editor. No luck. I am using Stata 12. Any pointers, specific directions?

  • #2
    You don't show any code that you tried (see FAQ Advice #12).

    On the whole, the more you want to depart from graph box, the more you need a different approach.

    Code:
    stripplot
    (SSC) offers one possibility.

    Code:
    sysuse auto, clear
    set scheme s1color
    egen median = median(mpg), by(foreign)
    egen loq = pctile(mpg), by(foreign) p(25)
    egen upq = pctile(mpg) , by(foreign) p(75)
    gen foreign2 = foreign + 0.1
    stripplot mpg, over(foreign) box(barw(0.2)) vertical ///
    addplot(scatter median loq upq foreign2, ms(none ..) ///
    mla(median loq upq) mlabcolor(blue ..)) xsc(r(. 1.2)) xla(, noticks)
    Click image for larger version

Name:	anotherboxplot.png
Views:	1
Size:	10.0 KB
ID:	1358406


    See also

    http://www.stata-journal.com/article...article=gr0039
    http://www.stata-journal.com/article...ticle=gr0039_1
    Last edited by Nick Cox; 28 Sep 2016, 17:55.

    Comment


    • #3
      Here is another variant. I like the quantile-box plots used by Emanuel Parzen. I also made the marker labels a little larger and shifted them away from the boxes.

      Code:
      sysuse auto, clear
      set scheme s1color
      egen median = median(mpg), by(foreign)
      egen loq = pctile(mpg), by(foreign) p(25)
      egen upq = pctile(mpg) , by(foreign) p(75)
      gen foreign2 = foreign + 0.15
      
      stripplot mpg, over(foreign) box(barw(0.2)) centre cumul ///
      cumprob vertical height(0.4) ///
      addplot(scatter median loq upq foreign2, ms(none ..) ///
      mla(median loq upq) mlabcolor(blue ..) mlabsize(*1.2 ..)) xsc(r(. 1.2)) xla(, noticks)
      Click image for larger version

Name:	anotherbox2.png
Views:	1
Size:	10.6 KB
ID:	1358433

      Comment


      • #4
        Thanks, Nick Cox. I had gotten called away to do some different number crunching and have just returned to this project. I will post my attempts and try your suggestions hopefully later today.

        Comment


        • #5
          Yet more annotation. Perhaps over the top, but some small tricks are shown. The means are shown in orange and the sample sizes too.

          Code:
          sysuse auto, clear
          set scheme s1color
          
          egen median = median(mpg), by(foreign)
          egen loq = pctile(mpg), by(foreign) p(25)
          egen upq = pctile(mpg) , by(foreign) p(75)
          egen mean = mean(mpg), by(foreign) 
          egen min = min(mpg)
          egen n = count(mpg), by(foreign) 
          
          gen shown = "{it:n} = " + string(n) 
          gen foreign2 = foreign + 0.15
          gen foreign3 = foreign - 0.15 
          
          gen showmean = string(mean, "%2.1f") 
          
          stripplot mpg, over(foreign) box(barw(0.2)) centre cumul ///
          cumprob vertical height(0.4) ///
          addplot(scatter median loq upq foreign2, ms(none ..) ///
          mla(median loq upq) mlabcolor(blue ..) mlabsize(*1.2 ..) || ///
          scatter mean foreign3, ms(none) mla(showmean) mlabcolor(orange) mlabsize(*1.2) mlabpos(9) || ///
          scatter min foreign, ms(none) mla(shown) mlabcolor(black) mlabsize(*1.2) mlabpos(6)) ///
          xsc(r(. 1.2)) xla(, noticks)
          Click image for larger version

Name:	anotherboxplotwithmorestuff.png
Views:	1
Size:	27.0 KB
ID:	1358771

          Comment


          • #6
            Hi Nick,

            First of all, thank you for the amazing stripplot. I replicated the following
            Code:
            sysuse bplong, clear
                . egen group = group(age sex), label
                . stripplot bp*, bar over(when) by(group, compact col(1) note("")) ysc(reverse) subtitle(, pos(9) ring(1) nobexpand bcolor(none) placement(e)) ytitle("") xtitle(Blood pressure (mm    Hg)) name(ST30, replace)

            Next I added jitter

            Code:
             stripplot bp*, bar over(when) jitter(2)  by(group, compact col(1) note("")) ysc(reverse) subtitle(, pos(9) ring(1) nobexpand bcolor(none) placement(e)) ytitle("") xtitle(Blood pressure (mm         Hg)) name(ST30, replace)
            Click image for larger version

Name:	ST30.png
Views:	1
Size:	155.1 KB
ID:	1630786



            I have 2 questions -
            1. Is it a good idea to use Jitter? can one somehow select the size of the bubbles using frequency weights? If it is possible, how to display the the size of each bubble in the legend?
            2. How to add label to the whiskers here?

            Comment


            • #7
              1. Jitter divides people. I would rather stack. My rationale is that jitter obliges some kind of mental calculation of How many points are near here? and I doubt that many of us are good at that. Although it's easy to plot the mpg variable in #2 #3 #4 plots I think a cumulative or quantile display is more direct and more informative, It's no surprise that the variable is reported as an integer, but in many other datasets granularity of the data -- repeated values showing up as stripes on the plot -- is a surprise and is worth knowing about (often as a signal of data quality). It would be obscured by jittering.

              2. You have not got much room for labels on the whiskers, I suggest.

              Comment


              • #8
                Hi Nick,

                Thank you for our insights. I was wondering how should one visualize the following

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte qe1bet float(group time)
                40 3 1
                30 3 2
                10 4 1
                30 3 2
                20 2 1
                10 6 2
                30 6 1
                10 6 2
                40 1 1
                50 4 2
                20 3 1
                40 3 2
                50 4 1
                50 4 2
                30 1 1
                10 5 2
                50 3 1
                30 3 2
                30 3 1
                10 3 2
                10 1 1
                50 1 2
                20 6 1
                30 3 2
                50 6 1
                20 3 2
                40 3 1
                40 1 2
                30 4 1
                 0 4 2
                end
                label values group group
                label def group 1 "18-30 Female", modify
                label def group 2 "18-30 Male", modify
                label def group 3 "31-45 Female", modify
                label def group 4 "31-45 Male", modify
                label def group 5 "45+ Female", modify
                label def group 6 "45+ Male", modify
                label values time wave
                label def wave 1 "Wave 1", modify
                label def wave 2 "Wave 2", modify

                qe1bet can take values 0,10,20,30,40 or 50

                Comment


                • #9
                  With that data example

                  Code:
                  stripplot qe1bet , by(group, note("")) over(time) vertical stack ms(Sh) height(0.2) yla(0(10)50) yla(, ang(h))
                  is one stab, but I don't know what I am expected to see. If the real dataset is much bigger something quite different might make more sense.

                  Comment


                  • #10
                    Hi Nick,

                    The data has 332 observations from 2 waves. qe1bet is the bet amount by an individual in a risk experiment. I am wondering how to best visualize the average changes in bet over time by gender and age group to start with.

                    If anything is unclear, please let me know.

                    Comment


                    • #11
                      I am favour of showing distributions, but if the aim is to show means, then show those means.

                      Comment


                      • #12
                        I am not sure if I follow you Nick. Do you mean histogram/density plots?

                        Comment


                        • #13
                          No; I am just thinking aloud. I played a bit with your sample data and concluded that it's hard to know what will work best without the full dataset.

                          Comment


                          • #14
                            I've used the code Nick posted several times and found it incredibly useful, so I thought I'd update the thread with a tweak I made. I have a large dataset and found that the labels for n, median, and quartiles got distorted by being plotted once for each observation. I added code to select only one observation to plot the labels. In this code, the first version shows the distortion, the second corrects it.
                            (I dropped several years of data to speed up the plotting. You can see the distortion with this version, but it gets worse the more observations you keep)

                            Code:
                            webuse regsmpl.dta, clear
                            set scheme s1color
                            keep if year<71
                            egen median = median(ln_wage), by(south)
                            egen loq = pctile(ln_wage), by(south) p(25)
                            egen upq = pctile(ln_wage) , by(south) p(75)
                            egen min = min(ln_wage)
                            egen n = count(ln_wage), by(south) 
                            
                            
                            gen shown = "{it:n} = " + string(n) 
                            gen south2 = south + 0.15
                            gen south3 = south - 0.15 
                            
                            stripplot ln_wage, msize(small)  over(south) box(barw(0.2)) vertical ///
                            addplot(scatter median loq upq south2, ms(none ..) ///
                            mla(median loq upq) mlabcolor(blue ..) mlabsize(*1.2 ..) || ///
                            scatter min south, ms(none) mla(shown) mlabcolor(black) mlabsize(*1.2) mlabpos(6)) ///
                            xsc(r(. 1.4)) xla(, noticks) ysc(r(-.5 3)) name(v1, replace)
                            
                            
                            bysort south (ln_wage): gen use=_n
                            
                            stripplot ln_wage, msize(small) over(south) box(barw(0.2)) vertical ///
                            addplot(scatter median loq upq south2 if use==1, ms(none ..) ///
                            mla(median loq upq) mlabcolor(blue ..) mlabsize(*1.2 ..) || ///
                            scatter min south if use==1, ms(none) mla(shown) mlabcolor(black) mlabsize(*1.2) mlabpos(6)) ///
                            xsc(r(. 1.4)) ysc(r(-.5 3)) xla(, noticks) name(v2, replace)

                            Comment


                            • #15
                              #14 Good you're finding this useful. In turn, that is a nice tip.

                              Your tagged variable


                              Code:
                              bysort south (ln_wage): gen use=_n  
                              followed by selecting
                              Code:
                              .... if use == 1
                              is equivalent to
                              Code:
                              egen tag = tag(south)
                              followed by selections
                              Code:
                              ... if tag

                              Comment

                              Working...
                              X