Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labelling box plot elements

    Hi,

    I wanted to ask how difficult would be to label the box plot elements on similar lines to the picture below (taken from the article in the Flowing Data blog)?

    Click image for larger version

Name:	box-plot-explained.gif
Views:	1
Size:	6.9 KB
ID:	403826
    Kind regards,
    Konrad
    Version: Stata/IC 13.1

  • #2
    Once: use the Graph Editor.

    Note that the definitions of outliers are wrong. An outlier is not defined by being < 1.5 LOQ or > 1.5 UPQ.

    Comment


    • #3
      Nick,

      as always thanks for getting back to me. With respect to the chart, it's just a picture that I took from the Internet, it wasn't my intention to start a discussion on how to define outliers (my preference would be to consider a point falling outside outer fence to be an outlier). This said, the point is that I've a piece of wok where I need to label elements of the box plot in a similar manner. As I have a number of box plots of different shapes it's not practical for me to do it via graph editor and play(), ideally I would address the problem with use of the text() where I could automatically code different labels. It crosses my mind that I could run su on a given variable and then pass the values to the graph but it's not clear to me how could I automatically draw the lines? Ideally, I would like to encapsulate all of that in a loop where I could generate box plots with all the labels for a number of variables automatically.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        I just noted that the original post you cited is wildly wrong on what is an outlier for box plot purposes, regards of any definition of outlier. That's not trying to start a discussion any more than pointing out that someone added 2 and 2 and got 5.

        Whether you are using graph box or graph hbox or twoway, you need to add text elements via text() so far as I can see. *title() options in principle offer alternatives, but not helpfully so far as I can see.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Whether you are using graph box or graph hbox or twoway, you need to add text elements via text() so far as I can see. *title() options in principle offer alternatives, but not helpfully so far as I can see.
          Using text() was on mind. Presumably, I would be able to get most of the relevant values via:
          Code:
          sysuse auto
          su mpg, detail
          return list
          and then pass all relevant r's to the text().
          Kind regards,
          Konrad
          Version: Stata/IC 13.1

          Comment


          • #6
            If you want to follow the rules

            highest value within [upq, upq + 1.5 iqr]
            lowest value within [loq - 1.5 iqr, loq]

            you have to do more work than that. More at http://www.stata-journal.com/article...ticle=gr0039_1 or http://www.stata.com/statalist/archi.../msg00917.html

            Comment


            • #7
              Thanks, I'm familiar with your seminal paper on box plots. What is unclear to me is why text() takes value 95 to place the value on the right hand side. I'm guessing that x-axis is from zero to 100. It would be so much easier if graph box could return list for all the the relevant elements.

              Code:
              /* == Box Plot With Nice Labels == */
              
              // Data
              sysuse auto, clear
              
              // Get values
              su mpg, detail
              return list
              
              // Graph box plot
              graph box mpg, ///
                  text(`r(p50)' 95 "Label one")
              Click image for larger version

Name:	Graph.png
Views:	1
Size:	13.9 KB
ID:	418932
              Kind regards,
              Konrad
              Version: Stata/IC 13.1

              Comment


              • #8
                That's for StataCorp. The crunch is that graph box supports categorical axes, but I still don't know why the y axis is regarded as scaled to [0, 100].

                My personal take, as already reported here, is that the 1.5 IQR rule, although manifestly very carefully thought out by its proponent John Tukey, is too tricky to explain to most audiences. I prefer whiskers to selected quantiles. The side-effect is that I have to program my own box plots, but that's OK. I've done that and stripplot (SSC) now supports box plots the way I like them.

                Comment

                Working...
                X