Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the default percentile in Stata for boxplot command?

    Hi, I have generated a boxplot in Stata and another researcher generated same plot using same data set. But I have found differences in case of outliers. He told me, he generated it using 5-95% whisker percentile. But I did not find any such options in "graph boxplot" command in Stata. I have read the article (https://journals.sagepub.com/doi/pdf...867X0900900309) from Nick Cox, specifically 3.5 (percentile-based whiskers), but did not help for graph boxplot. I don't want to generate it using "twoway rbar".
    Click image for larger version

Name:	Boxplot mismatch.png
Views:	1
Size:	25.5 KB
ID:	1675090

    Could you provide any help in this regard?
    Last edited by Rayhan Islam; 25 Jul 2022, 22:19. Reason: graphpad

  • #2
    The formulas for the upper and lower adjacent values are given in the Methods and formulas section of the user manual's entry for graph box. I don't see any option for changing the formula.

    It looks as if the datasets that your friend used with GraphPad are not the same as what you used in Stata.

    Comment


    • #3
      To amplify the point made by Joseph Coveney graph box and graph hbox use a whisker definition which is to extend whiskers to the furthest point within 1.5 IQR of the nearer quartile. This is not only documented in the manual but also explained on p.480 of the paper cited in #1, contrary to the report there. I don't know what 3.5 is alluding to. There is no command graph boxplot.

      To get boxplots using 5 and 95% percentiles to define whiskers, the easiest way I know is to use stripplot from SSC: This command allows hybrid box and strip or dot plots, such as

      Code:
      sysuse auto, clear
      stripplot mpg , box(barw(0.05)) pctile(5) vertical stack height(0.2) boffset(-0.1) ms(Sh) scheme(s1color) over(foreign)
      Otherwise use the recipe in that 2009 paper, noting a 2013 correction.

      Code:
      SJ-13-2 gr0039_1  . Speaking Stata: Creating and varying box plots: Correction
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q2/13   SJ 13(2):398--400                                (no commands)
              corrects error in code given
      
      SJ-9-3  gr0039  . . . . . . . . Speaking Stata: Creating and varying box plots
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q3/09   SJ 9(3):478--496                                 (no commands)
              explains how to use egen to calculate the statistical
              ingredients needed for box plots and variations of box
              plots; shows the use of twoway to then create the plots

      EDIT It's a different question, but no box plot rule works well for the graphs shown in #1. The group sizes are in several instances so small that the 5% and 95% are reported to be the sample minimum and maximum. or the boxes are degenerate otherwise.

      There seems to be some kind of censoring or truncation. In these circumstances a direct dot or strip plot is more informative, and as above that doesn't rule out showing medians, quartiles and other percentiles as well to the extent that they are defined.
      Last edited by Nick Cox; 26 Jul 2022, 02:41.

      Comment


      • #4
        Many thanks Nick Cox and Joseph Coveney for your reply. I got the point now. Actual command is "graph box".

        Comment

        Working...
        X