Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Violinplot labels and legend

    Dear All

    I am using violinplot by Ben Jann available from SSC to draw a violinplot.
    Two things I fail to add to my graph:
    1. Add a legend which explains the drawn elements, e.g., mean or IQR, using the used colours and markers.
    2. Add the number of _n to the labels of the over category. So, how many observations were used to draw each violine, including the (total).
    As example, I use
    Code:
    sysuse nlsw88
    violinplot wage , pdf(ll(0)) over(union)
    Here, I would like to see:
    1. the legend to explain the IQR and the marker of the mean (see dummy text box)
    2. the number of observation per category of union printed below its label (see added in graph editor)

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	62.3 KB
ID:	1770819


    Any help would be highly appreciated.

  • #2
    You can turn on and off the legend. You need to select what elements of the legend you need and label them appropriately. To see all the elements, simply specify the option -legend(on)-. Below, I choose the 2nd and 4th elements for illustration. The number of observations can be added by modifying the value labels of the -over()- variable. For this, I use labmask from the Stata Journal.

    Code:
    search labmask

    Code:
    sysuse nlsw88, clear
    decode union, gen(unionlab)
    bys union: gen nobs=_N
    replace unionlab = unionlab + " ({it:n} =" + string(nobs) + ")"
    labmask union, values(unionlab)
    violinplot wage if !missing(union), pdf(ll(0)) over(unionlab) leg(on order(2 "Whatever A" 4 "Whatever B"))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	49.9 KB
ID:	1770822

    Comment


    • #3
      Dear Andrew

      ah, very helpful, thank you!
      Is there any way to split the labels on two lines, so the (n=..) goes into the second line?

      Comment


      • #4
        It's my understanding that violin plots usually show medians, not means.

        What violin plots never show, in my reading, is how the smooth density was calculated and how that calculation was decided on. That's down to the user, not the programmer or the people who invented the method.

        Proverbially, there's no accounting for taste.

        To my taste:

        1. The ordering of quantile plots imparts as much smoothing -- without artifacts or arbitrariness -- as I usually want, while letting me see whatever gaps and outliers and other awkward details there are directly, without the filter of a kernel.

        2. In this example and many others, working on logarithmic scale is natural (pun intended). Geometric means, medians and quartiles are all compatible with that.

        3. If you show all the data, boxplot whiskers do no harm, but they aren't needed. (Sometimes I show whiskers to matched percentiles, such as 5% and 95%.)

        4. Naturally you could use a normal (here, lognormal) distribution as reference.

        Code:
        sysuse nlsw88, clear
        
        * from SSC
        pctilesets wage, over(union) saving(pctile, replace) min max p(25 50 75)
        
        clonevar origgvar=union
        merge m:1 origgvar using pctile
        gen where = 1.05
        
        * there's a -gmean()- function in -egenmore- on SSC, but two steps work too
        egen gmean = mean(ln(wage)), by(union)
        replace gmean = exp(gmean)
        
        set scheme stcolor
        bysort union (wage) : gen where2 = cond(_n == 1, 0, cond(_n == _N, 1, .))
        gen where3 = 0.5
        gen where4 = 1.1
        gen show = "{it:n} = " + strofreal(n)
        
        * from Stata Journal
        qplot wage, by(union, legend(off) note("boxes show medians and quartiles" "solid horizontals show geometric means")) ///
        addplot(rbar p25 p75 where, barw(0.05) fcolor(none) lc(stc2) ///
        || scatter p50 where, ms(D) mc(stc2) || line gmean where2, lc(black) || ///
        scatter where4 where3, ms(none) mlabel(show) mlabsize(large) mlabpos(0))  ///
        ysc(log) yla(1 2 5 10(10)40) xl(0 0.25 "0.25" 0.5 "0.5" 0.75 "0.75" 1)

        Click image for larger version

Name:	notsviolinplot.png
Views:	1
Size:	39.6 KB
ID:	1770838

        Last edited by Nick Cox; 15 Jan 2025, 10:08.

        Comment


        • #5
          Originally posted by Daniel Schnyder View Post
          Dear Andrew

          ah, very helpful, thank you!
          Is there any way to split the labels on two lines, so the (n=..) goes into the second line?
          You need compound double quotes to get line breaks. See

          Code:
          help quotes

          Code:
          sysuse nlsw88, clear
          decode union, gen(unionlab)
          bys union: gen nobs=_N
          replace unionlab = `"""' + unionlab + `"""'+ `"  "({it:n} = "' + string(nobs) +`")""'
          labmask union, values(unionlab)
          violinplot wage if !missing(union), pdf(ll(0)) over(unionlab) leg(on order(2 "Whatever A" 4 "Whatever B"))
          Click image for larger version

Name:	Graph.png
Views:	1
Size:	54.3 KB
ID:	1770843

          Comment

          Working...
          X