Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • violin plot with shading of selected regions?

    Dear list members,

    a quick question. I see there are a number of user-written commands for so-called violin plots. Before venturing in one, I'd like to know if any of them supports the shading of selected parts of the plotted distributions - ideally according to specified parameters that can vary across groups when plots are somehow grouped.

    Many thanks in advance
    I'm using StataNow/MP 18.5

  • #2
    Here's a quick answer. I've not heard of any such, either in Stata implementations or even elsewhere. (References to published examples would be helpful.)

    Any such variation would almost certainly be documented in the help files. It's just possible that an author implemented such a variation as an undocumented option, but in that case it would be essential to include an option name in the syntax statement typically near the top of the code.

    Violin plots are already quite complicated and (contrary to what is usually intended) they often omit or obscure detail in the data that can be interesting or even important. It seems rare that anyone documents or discusses how the smoothed density traces were produced and why the choices made (kernel type and width) were good choices. It seems immensely more common that users just accept software defaults as showing "the" distribution. Mind you, you could say the same about histograms, box plots, and the like.

    Comment


    • #3
      Okay, I realized that violinplot (SSC describe violinplot) can probably do what I need, through the box() option and type(fill) and statistics() sub-options, although somehow statistics(d(lo) d(hi)) doesn't seem to work despite the help file saying that all statistics from dstat (SSC describe dstat) are supported, and having taken care to specify pdfopts(k()).

      Ben Jann , apologies in advance for this likely banal issue, do you have a quick suggestion of what I might be taking wrong here?
      Last edited by Matteo Pinna Pintor; 23 Mar 2025, 06:25. Reason: corrected detail and tagged command author
      I'm using StataNow/MP 18.5

      Comment


      • #4
        Hello Nick. I don't have a published example, but I think I have a good reason. The issue is: plotting BMI distributions across groups, while highlighting the densities within certain intervals - which mean the same thing across groups but could have group-specific extremes.

        I agree that novelty -in numerical or graphical methods- exacerbates reliance upon default settings in software commands, and consequent carelessness in reporting. And some novelties appear almost ideally suited to impress and distract away from what is not reported. But as you say, the problem as deeper roots.
        I'm using StataNow/MP 18.5

        Comment


        • #5
          BMI distributions would be interesting data. In passing I note that BMI is suspect in many ways, as mass can be health-promoting or not, and as BMI is suspect dimensionally, as Nick Trefethen pointed out (https://people.maths.ox.ac.uk/trefethen/bmi.html and references).

          As usual (for me) I would favour some variation on quantile plots, which can be given as much or as little box plot ornamentation as one wishes.

          Comment


          • #6
            Yes, it's a well known problem in epidemiology and physiology - BMI is essentially always a proxy, never the variable you actually need. Many times it's fat-free mass - hard to measure in large samples. If waist and hip circumferences are known, there are now pretty accurate predictive equations that can be applied. For some purposes and in some settings, however, BMI remains sufficiently informative.

            Yes I could use your stripplot perhaps. But the issue is, I want to highlight that, among some groups, in one of them, the density within a certain interval is smaller. The alternative is to just plot these distributions and then do a bar graph with the frequency of observations within this interval for the groups. I'll see.
            I'm using StataNow/MP 18.5

            Comment


            • #7
              I was thinking more of qplot (Stata Journal) than stripplot (SSC), for various reasons.
              Last edited by Nick Cox; 23 Mar 2025, 10:18.

              Comment


              • #8
                I see. At any rate, I realize I was misunderstanding the statistics() option, which accepts as inputs quantiles but not fixed values of the x variable. I should the probably retrieve the quantile taht approximately corresponds to my cutoffs for each group, and then combine them.
                I'm using StataNow/MP 18.5

                Comment


                • #9
                  #8 is referring back to #3.

                  Comment


                  • #10
                    Yes sorry, indeed it does. I would still find it valuable if someone versed with the command could suggest an easy way to shade regions of the PDFs based on x values, and not on quantiles.
                    I'm using StataNow/MP 18.5

                    Comment


                    • #11
                      The author of the command, Ben Jann, kindly provided for my request, and introduced the box() sub-option limits(# #), which overrides statistics(), to specify values of the plotted variable(s) as extremes of the interval of values between which the shading of the area under the distribution curve is desired.

                      For the moment, the updated version of the command can be downloaded from Ben's github:

                      Code:
                      net from https://raw.githubusercontent.com/benjann/violinplot/main/
                      
                      net install violinplot, replace
                      Thanks Ben! This might well become useful to others too.
                      I'm using StataNow/MP 18.5

                      Comment

                      Working...
                      X