Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inserting a Y-axis scale break to better visualize dataset

    Hi there,

    Thank you for the help, I am very new to STATA and looking for some practical help.

    I am currently trying to plot a dataset as box plot. However, the graph is "shrinked" due to some data points being very variable. Therefore, I would like to include a Y-axis break to better visualize the data, without losing a view on the variability.

    Does anyone know how to do so?

    Thank you

    Giuseppe

  • #2
    The short objective answer is that Stata does not support axis breaks as such. There is an FAQ on what you can do.

    An opinionated answer is that you won’t produce a better box plot this way. You would in my view be better off working on a transformed scale and/or using a different plot.

    The details would depend on your data. Can you give a data example or minimally minimum, median and quartile, maximum?

    Comment


    • #3
      Hi Nick,

      Thank you for your reply. what plot/scale would you suggest. I

      Here is an example of the problem of the "squeezed" box.

      Giuse
      Attached Files

      Comment


      • #4
        Please show the results of

        Code:
        dataex WT N370S E326K L444P
        https://www.statalist.org/forums/help#stata explains.


        Comment


        • #5
          Hi Nick,

          Sorry, the code won't work since the WT N370S are categories within one variable (GENOTYPE). I was reading further in the link, but it is not clear how this would apply to GENOTYPE variable.

          Comment


          • #6
            Code:
            dataex GENOTYPE whatever
            then -- where whatever is the variable with the numbers..

            Comment


            • #7
              is this the correct info you requested?

              input byte GENOTYPE double PT_GluCer16
              2 .
              0 .
              0 .
              0 .
              0 .
              0 .
              0 .
              1 .
              3 .
              0 .
              2 .
              3 .
              0 .
              0 .
              0 .
              0 .
              2 .
              0 .
              0 .
              1 .
              0 .
              0 .
              0 .
              0 .
              0 .
              0 .
              0 .
              1 .
              0 .
              0 .
              0 .
              0 .
              1 .
              0 .
              2 .
              0 .
              0 .
              0 .
              0 .
              0 .
              0 2.805811768029029
              0 .
              1 3.69101673968684
              2 35.51065702524104
              0 .
              0 .
              1 6.609789913225027
              0 .
              0 .
              0 .
              0 .
              0 .
              0 .
              3 3.568295121213317
              1 2.104049251885561
              0 3.474838380598717
              0 4.282402542426962
              0 .
              0 2.829011116888829
              0 .
              0 3.183335901149354
              0 .
              0 4.804121982576199
              0 .
              0 5.191540309783429
              0 .
              0 .
              0 .
              0 .
              0 .
              3 2.323830617962953
              1 2.060165130974084
              0 .
              0 .
              0 2.36940525969882
              0 .
              0 .
              0 .
              0 .
              0 .
              0 .
              0 .
              2 .
              1 4.21425472120051
              0 6.753274222961578
              0 .
              0 3.252036248850126
              0 4.194324700546706
              0 1.891702019220545
              0 .
              0 4.288290558446413
              1 .
              3 3.739420409455521
              0 5.469139970244989
              0 4.829025506161408
              0 3.460074488334679
              0 3.253103028329073
              3 3.275316552064364
              0 3.062728504995679
              3 4.281828108047959
              end
              label values GENOTYPE Genotypes
              label def Genotypes 0 "WT", modify
              label def Genotypes 1 "N370S", modify
              label def Genotypes 2 "E326K", modify
              label def Genotypes 3 "L444P", modify
              [/CODE]
              copy up to and including the previous line ------------------

              Listed 100 out of 241 observations
              Use the count() option to list more

              Comment


              • #8
                That worked well enough. Thanks! You cut out the clear line at the beginning and count(241) would have worked too.

                The missing values can't be plotted. Your dataset is not so large that showing all data points is a bad idea and indeed it's a very good idea.

                Log scale is indicated to me, but graph box, ysc(log) is not at all a good idea as explained tediously in https://www.stata.com/support/faqs/g...ithmic-scales/

                I fired up stripplot from SSC. As I show all the data, the silly stuff about are they at least 1.5 IQR from the nearer quartile can just be avoided. My whiskers go to the extremes.

                Clearly your full dataset includes more data points. It's a serious limitation of #1 in my view that subset size isn't clear.

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte GENOTYPE double PT_GluCer16
                0 2.805811768029029
                1  3.69101673968684
                2 35.51065702524104
                1 6.609789913225027
                3 3.568295121213317
                1 2.104049251885561
                0 3.474838380598717
                0 4.282402542426962
                0 2.829011116888829
                0 3.183335901149354
                0 4.804121982576199
                0 5.191540309783429
                3 2.323830617962953
                1 2.060165130974084
                0  2.36940525969882
                1  4.21425472120051
                0 6.753274222961578
                0 3.252036248850126
                0 4.194324700546706
                0 1.891702019220545
                0 4.288290558446413
                3 3.739420409455521
                0 5.469139970244989
                0 4.829025506161408
                0 3.460074488334679
                0 3.253103028329073
                3 3.275316552064364
                0 3.062728504995679
                3 4.281828108047959
                end
                label values GENOTYPE Genotypes
                label def Genotypes 0 "WT", modify
                label def Genotypes 1 "N370S", modify
                label def Genotypes 2 "E326K", modify
                label def Genotypes 3 "L444P", modify
                
                stripplot PT, over(GENOTYPE) box(barw(0.1)) pctile(0) boffset(-0.1) ysc(log) cumul height(0.7) vertical yla(2 5 10 20 50, ang(h))
                Click image for larger version

Name:	glucose.png
Views:	1
Size:	23.4 KB
ID:	1741114

                Comment


                • #9
                  Hi Nick,

                  Thank you very much for the help.

                  I have used the code as you suggested.

                  Code:
                  stripplot PT_GluCer16, over(GENOTYPE) box(barw(0.1)) pctile(0) boffset(-0.1) ysc(log) cumul height(0.7) vertical yla(2 5 10 20 50, ang(h))
                  I am trying to upload the graph but is giving me error multiple times. However, the graph is perfect to visualize the data.

                  Thanks again

                  Comment


                  • #10
                    Good. Thanks for closure. I can't tell you what the upload problem is.

                    Comment


                    • #11
                      Nick Cox and I have (I believe cordially) agreed to disagree about the value of axis breaks.

                      Here's a little draft note that offers one idea of one way to introduce axis breaks in Stata. https://uwmadison.box.com/s/myjlzg67...fcqpjz6hsdi2br

                      Comment


                      • #12
                        I assent cordially to cordially.

                        A little personal history is a little relevant. At secondary school in the 1960s we drew all graphs by hand. It was easy and we were sometimes encouraged and sometimes instructed to convey axis breaks with jagged lines.

                        At the same time many of the books we were using had illustrations with logarithmic scales and accepting that they were often not just helpful but even essential was, as far as I can recall, utterly painless for me. Graph paper with single or double logarithmic scales was routinely available.

                        Roll on almost 60 years, and official Stata is utterly unobliging on axis breaks. I can't speak for the company but I believe their rationale is some combination of (1) people rarely want them. or should rarely want them (2) they are immensely more difficult and awkward to implement than anyone wants to know. (Consider for example what should happen if you were to combine graphs with or without axis breaks, or with different axis breaks.)

                        It would be interesting to know if other comparable software were different.

                        In contrast, in Stata now using logarithmic scale is usually easy, and any other nonlinear scale (without breaks) is easy too with some small tricks.

                        This is related to, but not identical to, a question of whether you should show zero on any axis, which is itself simple: do that if it helps -- or it's essential to avoid a misleading graph.

                        That's what it is. John Mullahy is believe enormously younger than I am, but either way he stands as someone who wants axis breaks. But beyond that, requests for axis breaks and use of axis breaks in forums or literature I see are both very rare.

                        Comment

                        Working...
                        X