
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Box plot editing

    I have created box plots for three distinct groups and I want to distinguish them from each other. I do not wish to differentiate the groups based on box colors, but rather based on the presence of diagonal or dotted lines within the box plots. I would like to have a black-and-white box plot. Can you assist me with this, please?"

  • #2
    Click image for larger version

Name:	Graphh.jpg
Views:	1
Size:	46.5 KB
ID:	1751204

    This is the image


    • #3
      It's hard to do this by editing without minor mess.

      I greatly prefer a direct code solution.

      Here is an example with two groups. The code arises from a mix of principle and experiment.

      sysuse auto, clear
      foreach f in 0 1 { 
      su mpg if foreign == `f', detail 
      local upq`f' = r(p75)
      local UPQ`f' : di %2.0f r(p25)
      local med`f' = r(p50)
      local MED`f' : di %2.0f r(p50)
      local loq`f' = r(p25)
      local LOQ`f' : di %2.0f r(p25)
      graph box mpg, over(foreign) vertical aspect(1) ///
      text(`upq0' 1 "`UPQ0'" ) text(`med0' 1 "`MED0'" ) text(`loq0' 1 "`LOQ0'") /// 
      text(`upq1' 61 "`UPQ1'" ) text(`med1' 61 "`MED1'" ) text(`loq1' 61 "`LOQ1'")

      FWIW, the standard abbreviation for kilograms is kg, not Kg. It's a matter of taste but once the units have been explained on the vertical axis, I wouldn't myself repeat them for every instance.

      Diagonal or dotted lines are sometimes asked for here, but I am not playing on that one. You'd get a much clearer result by direct labelling.

      Post a proper data example using dataex and I will make suggestions for code.


      • #4
        This code applies a strategy discussed at

        Clearly in your case you want everything black, but that is an easy fix.

        The code assumes groups coded by integers 1 up; it is easy to get there with egen, group().

        There is enormous scope for different takes here.

        Direct labelling beats legends here, and usually.

        With short variable names and no labels, vertical alignment would work just as well. Often with longer text you really need to go horizontal.

        As a token demonstration that you can go further in this framework I show how you can add means too.

        But also -- a different story -- with 3 groups you have space for much more detail, such as quantile-box plots.

        * fake data 
        set obs 300
        egen group = seq(), block(100)
        label def group 1 frog 2 toad 3 new
        label def group 1 frog 2 toad 3 newt 
        label val group group 
        set seed 2803 
        gen y = rnormal(group, 1)
        * moderately general code follows 
        * what you might most obviously need to change: 
        *    outcome variable name -- here y 
        *    group variable name -- here group 
        *    ytitle -- here "whatever"
        *    some numbers if categories not coded by successive integers or # groups is not 3 
        *    display format for median and quartiles 
        statsby mean=r(mean) min=r(min) loq=r(p25) med=r(p50) upq=r(p75) max=r(max), by(group) clear: summarize y, detail 
        gen LOQ = strofreal(loq, "%3.2f")
        gen MED = strofreal(med, "%3.2f")
        gen UPQ = strofreal(upq, "%3.2f")
        gen group2 = group + 0.12
        twoway  rspike min loq group, horizontal lc(stc1) ///
            ||  rspike max upq group, horizontal lc(stc1) ///
            ||  rbar med loq group, barw(0.2) horizontal lcolor(stc1) fcolor(none) ///
            ||  rbar med upq group, barw(0.2) horizontal lcolor(stc1) fcolor(none) ///
            ||  scatter group2 loq, ms(none) mlab(LOQ) mlabc(stc2) mlabpos(12) ///
            ||  scatter group2 upq, ms(none) mlab(UPQ) mlabc(stc2) mlabpos(12) ///
            ||  scatter group2 med, ms(none) mlab(MED) mlabc(stc2) mlabpos(12) ///
            ||  scatter group mean, ms(Dh) msize(large) mlabc(black) /// 
                yla(1/3, valuelabel noticks) xtitle("whatever") legend(off)

        Click image for larger version

Name:	3boxes.png
Views:	1
Size:	35.6 KB
ID:	1751238


        • #5
          Click image for larger version

Name:	BOX Plot.jpeg
Views:	1
Size:	74.6 KB
ID:	1751241

          Sir, I want to put dots or diagonal lines inside the box plot. How to do this. This is my query


          • #6
            Sir, I want to put dots or diagonal lines inside the box plot. How to do this. This is my query. I am not able to edit it as per my desire


            • #7
              As said in #3 I don’t support such patterns either in principle or in practice by suggesting code.


              • #8
                No problem.Thank You


                • #9
                  I need to change the y-axis range from 0 to 100, but when creating the box plot, it requires the upper limit to be greater than or equal to 1024. Please assist.
                  Attached Files


                  • #10
                    Please see the FAQ Advice, especially

                    Working backwards, the forum software isn't written by StataCorp and it has no understanding of ,gph formats. Graph attachments should be .png.

                    Here is your graph.
                    Click image for larger version

Name:	boxplot.png
Views:	1
Size:	26.1 KB
ID:	1757868

                    Now, what is the problem? It looks as if you want to show box plots on both original and logarihmic scales. But if values range from say 1 to 1000, then logarithms to base 10 range from 0 to 3 and your logarithmic box plots are squeezed accordingly and any other base would cause the same problem.

                    Box plots work badly here for various reasons. Note how often the median coincides with the upper quartile. That suggests to me some very small group sizes; you'd be better off showing the data directly.

                    Your data need logarithmic scale which doesn't rule out showing boxes too, but note the warnings within



                    That leads to the next point: a data example here would help us mightily in suggesting code that would produce a better graph,
                    Last edited by Nick Cox; 05 Jul 2024, 03:26.


                    • #11
                      There is a hint here that data are powers of 2 such as 1024, 512, 256, ...

                      The medians and quartiles don't contradict that, as they don't need to be data values, but may lie between.

                      If so, that has implications for axis labels.

                      niceloglabels from the Stata Journal can help here.

                      . niceloglabels 1 1024, style(2) local(yla)
                      1 2 4 8 16 32 64 128 256 512 1024
                      . niceloglabels 1 1024, style(2) powers local(yla)
                      1 "2{sup:0}"  2 "2{sup:1}"  4 "2{sup:2}"  8 "2{sup:3}"  16 "2{sup:4}"  32 "2{sup:5}"  64 "2
                      > {sup:6}"  128 "2{sup:7}"  256 "2{sup:8}"  512 "2{sup:9}"  1024 "2{sup:10}"


                      • #12
                        Lacking a real or even realistic data example, I simulated 4 groups that are skewed samples from a distribution with powers of 2 from 1 to 1024.

                        Then stripplot from SSC allows side-by-side quantile plots on logarithmic scale AND such a display is compatible to a good approximation with a box plot showing median and quartiles and selected quantiles to define whiskers.

                        It is not compatible with the Tukey rule for determining which data points should be shown separately.

                        set obs 100
                        set seed 314159
                        set scheme stcolor
                        gen y = 2^runiformint(0, 10)
                        gen group = ceil(_n/25)
                        niceloglabels 1 1024, style(2) local(yla)
                        stripplot y, xli(1.5(1)3.5) over(group) yla(`yla') ysc(log) height(0.6) vertical cumul cumprob cente box pctile(10) note("whiskers extend to 10 and 90% points")
                        Click image for larger version

Name:	qbplot.png
Views:	1
Size:	53.5 KB
ID:	1757910


                        • #13
                          Dear Statisticians,
                          I’ve created a box plot for my data set, and I would like to display the median and IQR values directly on the graph. Instead of inserting a text box and manually writing the values for each plot, I want to know if there’s a command that allows me to include that information automatically. I’ve been trying to do this using the graph editor option, where I can select the setting to show these values on the graph. I would appreciate your help with this.


                          • #14
                            I am not aware of any automated way to do this -- except through a script as in #4 with the IQR calculated in an additional or alternative step. As earlier in this thread, and elsewhere, I suggest that using twoway rather than graph box or graph hbox is ultimately more flexible.


                            • #15
                              Here's some token code.

                              * fake data 
                              set obs 300
                              egen group = seq(), block(100)
                              label def group 1 frog 2 toad 3 newt 
                              label val group group 
                              set seed 2803 
                              gen y = rnormal(group, 1)
                              * moderately general code follows 
                              * what you might most obviously need to change: 
                              *    outcome variable name -- here y 
                              *    group variable name -- here group 
                              *    ytitle -- here "whatever"
                              *    some numbers if categories not coded by successive integers or # groups is not 3 
                              *    display format for median and quartiles 
                              statsby mean=r(mean) min=r(min) loq=r(p25) med=r(p50) upq=r(p75) max=r(max), by(group) clear: summarize y, detail 
                              gen LOQ = strofreal(loq, "%3.2f")
                              gen MED = strofreal(med, "%3.2f")
                              gen UPQ = strofreal(upq, "%3.2f")
                              gen IQR = strofreal(upq - loq, "%3.2f")
                              gen group2 = group + 0.12
                              gen group3 = group - 0.12 
                              twoway  rspike min loq group, horizontal lc(stc1) ///
                                  ||  rspike max upq group, horizontal lc(stc1) ///
                                  ||  rbar med loq group, barw(0.2) horizontal lcolor(stc1) fcolor(none) ///
                                  ||  rbar med upq group, barw(0.2) horizontal lcolor(stc1) fcolor(none) ///
                                  ||  scatter group2 loq, ms(none) mlab(LOQ) mlabc(stc2) mlabpos(12) ///
                                  ||  scatter group2 upq, ms(none) mlab(UPQ) mlabc(stc2) mlabpos(12) ///
                                  ||  scatter group2 med, ms(none) mlab(MED) mlabc(stc2) mlabpos(12) ///
                                  ||  scatter group3 med, ms(none) mlab(IQR) mlabc(black) mlabpos(6) ///
                                  ||  scatter group mean, ms(Dh) msize(large) mlabc(black) /// 
                                      yla(1/3, valuelabel noticks) xtitle("whatever") ytitle("") legend(off) ysc(r(0.8 3.2))

