Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New violinplot package available from SSC

    Thanks to Kit Baum, a new violinplot package is available from SSC. Type

    Code:
    . ssc install violinplot
    to install the package. Stata 15 or newer and the latest versions of dstat, moremata, palettes, colrspace are required, type

    Code:
    . ssc install dstat, replace
    . ssc install moremata, replace
    . ssc install palettes, replace
    . ssc install colrspace, replace
    to install or update these packages.

    For a few examples see https://github.com/benjann/violinplot/.

    violinplot provides an alternative to the existing vioplot package, which is also available from SSC. Both packages have their advantages and disadvantages. A main advantage of vioplot is that it features an over() option to draw plots by subpopulations. Such an option does not exist in violinplot (use command separate if you want to display results by subpopulations; see the examples). A main advantage of violinplot is that it is more flexible with respect to how the density curves are estimated and plotted.

    ben

  • #2
    An update to violinplot is available. Type:

    Code:
    . ssc install violinplot, replace
    The whiskers were not computed correctly; this is fixed.

    Comment


    • #3
      Dear Ben Jann , I have data on universities' third-party funding over the last ten years. Now I would like to highlight the distribution of the third-party funding and to add the position/development of five individual universities over this period within the distribution. Would this addition to a violin plot be possible to achieve using your command?
      Last edited by Marc Kaulisch; 25 Oct 2022, 01:59.

      Comment


      • #4
        Hi Marc, not really clear to me what exactly you mean...

        Comment


        • #5
          This is slightly embarrassing, but I forgot to add the weights to the computations. That is, weights could be specified, but they were not used. An update that fixes this should become available from SSC soon. In the meantime, you can also update from GitHub:

          Code:
          . net from https://raw.githubusercontent.com/benjann/violinplot/main/
          . net install violinplot, replace
          ben

          Comment


          • #6
            Originally posted by Ben Jann View Post
            Hi Marc, not really clear to me what exactly you mean...
            My example is Click image for larger version

Name:	sline-example.PNG
Views:	1
Size:	87.0 KB
ID:	1686822
            Now I like to test what happens when I can replace the scatter dots and instead show information about the distribution. So my idea is to use violin plot and to add the lines representing single cases in order to see where a case is within the overall distribution.
            I hope my aim is more clear now.

            I am not a programmer - so not sure if possible but I think about an option like
            Code:
            addline(line time budget if uni == 1, lc(red) lw(thick) || line ....
            and this information is passed through to your twoway statement.

            Comment


            • #7
              Hi marc,

              in principle this is possible, but it is a bit of work. Also note that violinplot will always use an even spacing between the plot positions. Here's an example:

              Code:
              sysuse nlsw88
              separate wage if inrange(grade,10,15), by(grade) veryshortlabel
              separate ttl_exp if inrange(grade,10,15), by(grade) veryshortlabel
              separate tenure if inrange(grade,10,15), by(grade) veryshortlabel
              local vlist
              forv i=10/15 {
                  local vlist `vlist' (wage`i' ttl_exp`i' tenure`i')
              }
              violinplot `vlist', overlay noline fill(fc(%50)) ///
                  labels(wage ttl_exp tenure) nomedian mean(recast(line))
              Click image for larger version

Name:	Graph.png
Views:	5
Size:	232.9 KB
ID:	1686905
              Attached Files

              Comment


              • #8
                (sorry for the mess with graphs, I still didn't figure out how to upload graphs properly, it seems)

                Comment


                • #9
                  Originally posted by Ben Jann View Post
                  Hi marc,

                  in principle this is possible, but it is a bit of work. Also note that violinplot will always use an even spacing between the plot positions. Here's an example:

                  Great. Thank you. Now I give it a try with my data.

                  Comment


                  • #10
                    Dear Ben Jann , I have tried it and I failed.
                    Simple case with one variable over time works.
                    Adding variables with only one data point (representing one university) or no variance (mean of a group) gives me only the lines but not the distribution of the overall group of universities.

                    As an example see:
                    Code:
                    bys grade: egen city_wage = mean(wage) if c_city==1 // only central city
                    separate city_wage if inrange(grade,10,15), by(grade) veryshortlabel
                    
                    local vlist
                    forv i=10/15 {
                        local vlist `vlist' (wage`i' ttl_exp`i' city_wage`i')
                    }
                    violinplot `vlist', overlay noline fill(fc(%50)) ///
                        labels(wage ttl_exp city_wage) nomedian mean(recast(line))

                    Comment


                    • #11
                      This is because city_wage# is constant such that the density goes to infinity; this then causes all other densities to be squeezed to zero (because the max density determines the scaling) and become invisible. In general, it is not a good idea to include variables that are constant in violinplot. I would suggest adding such lines after the fact using command addplot (ssc install addplot). Example:

                      Code:
                      sysuse nlsw88, clear
                      // basic plot without the extra inner city line
                      separate wage if inrange(grade,10,15), by(grade) veryshortlabel
                      separate ttl_exp if inrange(grade,10,15), by(grade) veryshortlabel
                      local vlist
                      forv i=10/15 {
                          local vlist `vlist' (wage`i' ttl_exp`i')
                      }
                      violinplot `vlist', overlay noline fill(fc(%50)) ///
                          labels(wage ttl_exp) nomedian mean(recast(line))
                      // now add line for inner city
                      dstat wage if c_city==1 & inrange(grade,10,15), over(grade)
                      local yx
                      forv i = 1/`e(N_over)' {
                          local yx `yx' `=el(e(b),1,`i')' `i'
                      }
                      addplot: scatteri `yx', recast(line) norescaling ///
                          legend(order(3 "wage" 4 "ttl_exp" 5 "wage inner city"))
                      Of course this is all a bit clumsy...
                      ben

                      Comment


                      • #12
                        Thank you. I haven't worked with -addplot- yet. I tested it and it works as intended with my data.
                        I tried to tweak your violinplot ado by adding an -addlines-option and to pass this through into the twoway-command at the end - but I only get an error message ") required r(100)" ...

                        Comment


                        • #13
                          A major update to violinplot is available from SSC. To install the update, type

                          Code:
                          . ssc instal violinplot
                          My original motivation for violinplot was to display results from simulations, where each distribution is stored in a separate variable. However, I realized that in most applied settings one would probably want to display results from subpopulations. With the earlier version of violinplot this was not directly possible; one first had to prepare the date, e.g., using command separate, To make things easier in these situations I now added options over() and by(), as well as a number of related options. See https://github.com/benjann/violinplot/ for a list of changes and additions.

                          For example, the plot that Marc had in mind can now be generated with a single command, without complicated preparations or the application addplot:

                          Code:
                          . sysuse nlsw88, clear
                          . bys grade: egen city_wage = mean(wage) if c_city==1 // only central city
                          . violinplot wage ttl_exp city_wage if inlist(grade,10,11,12,14,16), ///
                              over(grade) atover overlay ///
                              noline fill(fc(%50)) nomedian mean(recast(line)) key(mean)
                          Click image for larger version

Name:	Graph.png
Views:	1
Size:	135.5 KB
ID:	1687652


                          Note the use of option atover that causes results to be placed at the values of the over-variable, rather using a categorical axis. Furthermore, violinplot no longer tries to display the density of a variable that is constant, this is why no distribution is plotted for city_wage (only the mean).
                          ben

                          Comment


                          • #14
                            Dear Ben, these are great enhancements. I am really happy and excited. Now I try my own example and I use the by-option and I wonder how I can control the addplots with the by-options.

                            I tried:
                            Code:
                            violinplot impact_p, over(period) ///
                                        by(field)  ///
                                        name(violin_by, replace) ///
                                        vertical ///
                                        addplot(line impact_p period if university==1 & field==1, lstyle(p7) || ///
                                    line impact_p period if university==2 & field==1, lstyle(p10)  || ///
                                    line impact_p period if university==3 & field==1, lstyle(p3)   || ///
                                    line impact_p period if university==4 & field==1, lstyle(p5)   || ///
                                    line impact_p period if university==5 & field==1, lstyle(p4)   || ///
                                    line impact_p period if university==6 & field==1, lstyle(p1)  ) //
                            The result looks like:
                            Click image for larger version

Name:	violinplot_test_by.PNG
Views:	1
Size:	53.0 KB
ID:	1687758


                            I would have expected the lines should only appear at field==1 - so All sciences. But they appear at in four graphs...

                            Comment


                            • #15
                              I believe the addplot() option really only works well with plots that do not contain subgraphs. This is a limitation I cannot change, I'm afraid. You could, however, apply the addplot command, with which you can address individual subgraphs (see the help file).

                              Comment

                              Working...
                              X