Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ideas for graphing ranking data

    Suppose you've surveyed many people and asked them to rank a small number of things. How would you graph the resulting data?

    This seems like a simple task, but I'm not fully satisfied with anything I've seen.

    Here are a few ideas:

    Code:
    * Simulate data
    clear
    set seed 123456
    lab def fruit 1 "Apples" 2 "Bananas" 3 "Oranges" 4 "Grapes" 5 "Pears" 6 "Peaches" 7 "Figs"
    set obs 7
    gen byte fruit = _n
    lab val fruit fruit
    expand 1000
    sort fruit
    gen id = 1 + mod(_n,1000)
    gen rand = rnormal(0,2) + fruit
    bysort id (rand) : gen rank = _n
    lab var rank "Rank"
    drop rand
    
    * Graph
    graph box rank, over(fruit) yreverse ylabel(1/7) ///
        name(box, replace)
    
    stripplot rank, over(fruit) tufte boffset(-0.3) iqr jitter(2) mcolor(%5) vertical yscale(reverse) ylabel(1/7) xtitle("") ///
        name(box2, replace)
    
    tabplot rank fruit, xtitle("") subtitle("") note("") title("") ///
        name(tab, replace)
    
    tabplot rank fruit, horizontal barwidth(1) xtitle("") subtitle("") note("") title("") ///
        name(tab2, replace)
    
    graph combine box box2 tab tab2, scale(0.6) col(2) scheme(s2mono) commonscheme
    Click image for larger version

Name:	rank graphs.png
Views:	1
Size:	289.9 KB
ID:	1775181


    The trouble is that most of the graphs that would otherwise be natural fits are optimized for continuous data.

    Box plots (top left) are questionable even for continuous data, but they seem especially crude here.

    You can fiddle with the fundamental ideas behind box plots (top right), but I think it's barking up the wrong tree.

    Something like a tab plot (bottom left) is more promising, but this feels too categorical, like the ordinal nature of the data is a coincidence.

    A rejiggered tab plot (bottom right) is my current best idea. We've got something more like a histogram going on. But even this doesn't seem ideal. For example, it would be nice to represent means, but adding a continuous mean on top of this fundamentally discrete representation doesn't sit right.

    Any better ideas, either your own or found in the wild?

    (Both stripplot and tabplot, used above, are Nick Cox creations.)

  • #2
    I'm not sure but first thing that your examples brought to mind were violinplots. I'd love to have a better way to graph this kind of ranked info (so now following to see what others suggest).

    Example that might follow your code:
    Code:
    ssc install vioplot, replace
    vioplot rank, over(fruit) yscale(reverse) ylabel(1/7) ///
        vertical title("Violin Plot of Rank by Fruit") name(violin, replace)
    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

    Comment


    • #3
      Violin plots are one of those representations that really wants continuous data, imho. A “discrete violin” is a possibility though.

      Comment


      • #4
        Good question. I'd say (prejudices blaring and glaring here) that the box plot variants do a poor job here, and the violin plot is no better.

        To follow through given our request that you name download sources of community-contributed commands, not so much authors, stripplot is from SSC and tabplot is from the Stata Journal.

        We surely need plots that show the rank frequencies directly. I am very positive about displays that are both graph and table -- the art being to show frequencies or percents clearly without making them too obtrusive. Note that the tabplot examples in #1 could be annotated.

        A 2021 presentation surveying several possibilities is accessible via https://www.stata.com/meeting/uk21/ I started writing it up with a view to publication.

        You omitted the stacked bar chart that -- in my reading -- is the most popular choice, but it doesn't get much support from me either.

        Here is a floating or sliding bar plot using floatplot from SSC. With your data the categories are already in a good order, but otherwise see (for example)

        https://journals.sagepub.com/doi/pdf...6867X211045582

        I would say that this variant suffers from the difficulty of showing rare categories clearly. Why am I more positive about this design than about plain stacked bar charts? Letting it float or slide allows the general level of each group to become more obvious.

        Here is the code.

        Code:
        * Simulate data
        clear
        set seed 123456
        lab def fruit 1 "Apples" 2 "Bananas" 3 "Oranges" 4 "Grapes" 5 "Pears" 6 "Peaches" 7 "Figs"
        set obs 7
        gen byte fruit = _n
        lab val fruit fruit
        expand 1000
        sort fruit
        gen id = 1 + mod(_n,1000)
        gen rand = rnormal(0,2) + fruit
        bysort id (rand) : gen rank = _n
        lab var rank "Rank"
        drop rand
        
        floatplot rank , over(fruit) centre(4) vertical ///
        fcolors(red red*0.6 red*0.1 gs12*0.5 blue*0.1 blue*0.5 blue) ///
        lcolors(red red red black blue blue) ytitle(Percents centred on rank 4)

        Click image for larger version

Name:	floatplot.png
Views:	1
Size:	51.0 KB
ID:	1775198


        Here are some extracts from the version of the help at my end, which have been revised since the SSC version of 31 October 2022. Further references are most welcome.

        Remarks

        The history of this plot depends on its definition. For example, bars floating relative to an axis have long been used to indicate (say) the reigns of monarchs, the
        lives of famous people, radio and television schedules, high and low prices, or the durations allocated to or consumed by various tasks (so-called Gantt charts, for
        example).

        A plot does not absolutely need a name if a widely agreed name does not exist, but a Stata command certainly does. Brinton (1939) briefly showed what he called
        bilateral bar charts and similar designs appear under that and other names in many bar charts showing paired variables (such as pyramids showing age and sex
        breakdown of populations). To give just one reference, Wilkinson et al. (1996) use the terms dual bar chart and mirror plot. The focus here is rather on bar charts
        in which several ordered categories are shown at once. Greater credit must be given to Stouffer et al. (1949a, 1949b) who showed many examples of such charts,
        without ever naming them so far as I can tell.

        Spear (1952) and Schmid (1954) both used the terms sliding bar (for horizontal plots) and floating column (for vertical plots). The terminology may well be older or
        perhaps both authors devised those terms independently. Either way, Spear gave no literature references, while Schmid did give literature references but did not
        cite Spear. The terms are repeated in later works, Spear (1969) on one side and Schmid and Schmid (1979) and Schmid (1983) on another. In her later book Spear did
        give some literature references but not to any work by Schmid, while none of the Schmid sequels cite Spear either. Be that as it may, terms such as sliding, slide,
        or floating have been repeated by others, such as Lockwood (1969), Mueller, Schuessler and Costner (1970, 1977), and Robertson (1988).

        In their papers giving a big push to the idea, Robbins and Heiberger (2011) and Heiberger and Robbins (2014) talk of diverging stacked bar charts, a term that is
        much more informative, but also a little more clunky in my view. See also Heiberger and Holland (2015). Schwabish (2023) uses the term diverging bar charts.

        Smith (2022) uses the term spine chart.

        Contrariwise, the example given by Rahlf (2017, pp.108-110; 2019, pp.108-110) underlines that such graphs can easily be given without using any special name.

        A problem for me as author is that I wrote a Stata command slideplot in 2003 as a wrapper for graph bar or graph hbar. It should remain accessible, so that name is
        taken, and a slightly different name such as slidebar seems likely to be confusing. Hence, with some small pleasure in a whimsical name that should make sense once
        people see the results, I have called this command floatplot.

        Although neither option is required, the most interesting and useful plots show how the distribution of an outcome varies with one or two predictors, for which
        over() and by() options are supplied.

        Note that numvar and overvar if specified are temporarily mapped to numeric variables that are integers 1 up, regardless of their existing values.

        Categories that do not appear in the data may be considered as shown with bars of zero length, which should be regarded as defined but invisible. Text displays of 0
        for percents, proportions, or frequencies are suppressed too. Note that text labels may overlap whenever adjacent categories are infrequent. Users finding this
        puzzling or unclear, for themselves or their readers, may wish to turn to a different design. In particular, tabplot from the Stata Journal makes zeros discernible
        as holes in a display and small frequencies discernible as short bars.

        Here even more than usually, the command is offered as indicative, not definitive. In particular, the design codifies prejudices in favour of a hybrid graph and
        table display.




        References

        Aitkin, M., D. Anderson, B. Francis, and J. Hinde. 1989. Statistical Modelling in GLIM. Oxford: Oxford University Press.

        Aitkin, M., B. Francis and J. Hinde. 2005. Statistical Modelling in GLIM 4. Oxford: Oxford University Press.

        Aitkin, M., B. Francis, J. Hinde and R. Darnell. 2009. Statistical Modelling in R. Oxford: Oxford University Press.

        Bentley, J. L. 1984. Programming Pearls: Graphic output. Communications, Association for Computing Machinery 27: 529-536.

        Bentley, J. L. 1988. More Programming Pearls: Confessions of a Coder. Reading, MA: Addison-Wesley.

        Bergstrom, C. T. and J. D. West. 2020. Calling Bullshit: The Art of Skepticism in a Data-Driven World. New York: Random House. See p.229.

        Box, G. E. P., J. S. Hunter and W. G. Hunter. 2005. Statistics for Experimenters: Design, Innovation, and Discovery. Hoboken, NJ: John Wiley.

        Box, G. E. P., W. G. Hunter, and J. S. Hunter. 1978. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. New York: Wiley.

        Brinton, W. C. 1939. Graphic Presentation. New York: Brinton Associates.

        Cox, N. J. 2004a. Speaking Stata: Graphing distributions. Stata Journal 4: 66-88.

        Cox, N. J. 2004b. Speaking Stata: Graphing categorical and compositional data Stata Journal 4: 190-215.

        Cox, N. J. 2014. Stata tip 119: Expanding datasets for graphical ends. Stata Journal 14: 230-235.

        Cox, N. J. 2016. Speaking Stata: Multiple bar charts in table form Stata Journal 16: 491-510.

        Duncan, O. D., H. Schuman, and B. Duncan. 1973. Social Change in a Metropolitan Community. New York: Russell Sage Foundation.

        The Economist. 2024. Getting on, getting out: Official statistics reveal why some places have so few graduates. April 6, 22-23.

        Evergreen, S. D. H. 2017. Effective Data Visualization: The Right Chart for the Right Data. Thousand Oaks, CA: SAGE. [second edition 2020]

        Fienberg, S. E. 1980. The Analysis of Cross-Classified Categorical Data. Cambridge, MA: MIT Press.

        Friendly, M. 2000. Visualizing Categorical Data. Cary, NC: SAS Institute.

        Friendly, M. and D. Meyer. 2016. Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton, FL: CRC Press.

        Heiberger, R. M. and B. Holland. 2015. Statistical Analysis and Data Display: An Intermediate Course with Examples in R. New York: Springer.

        Heiberger, R. M. and N. B. Robbins. 2014. Design of diverging stacked bar Charts for Likert scales and other applications. Journal of Statistical Software 57(5):
        1-32. doi:10.18637/jss.v057.i05

        Lockwood, A. 1969. Diagrams: A Visual Survey of Graphs, Maps, Charts and Diagrams for the Graphic Designer. London: Studio Vista.

        Mueller, J. H., K. F. Schuessler and H. L. Costner. 1970. Statistical Reasoning in Sociology. Boston: Houghton Mifflin. See pp.93-94.

        Mueller, J. H., K. F. Schuessler and H. L. Costner. 1977. Statistical Reasoning in Sociology. Boston: Houghton Mifflin. See pp.49-50.

        Rahlf, T. 2017. Data Visualization with R: 100 Examples. Cham: Springer.

        Rahlf, T. 2019. Data Visualization with R: 111 Examples. Cham: Springer.

        Robbins, N. B. and R. M. Heiberger. 2011. Plotting Likert and other rating scales. JSM Proceedings, Section on Survey Research Methods, 1058-1066. Alexandria,
        VA: American Statistical Association. https://www.amstat.org/membersonly/p...0784_64164.pdf Google for copies if this official version is
        inaccessible to you.

        Robertson, B. 1988. Learn to Draw Charts and Diagrams Step by Step. London: Macdonald.

        Schmid, C. F. 1954. Handbook of Graphic Presentation. New York: Ronald Press.

        Schmid, C. F. 1983. Statistical Graphics: Design Principles and Practices. New York: John Wiley.

        Schmid, C. F. and S. E. Schmid. 1979. Handbook of Graphic Presentation. New York: John Wiley.

        Schwabish, J. 2023. Data Visualization in Excel: A Guide for Beginners, Intermediates, and Wonks. Boca Raton, FL: CRC Press.

        Setlur, V. and B. Cogley. 2022. Functional Aesthetics for Data Visualization. Hoboken, NJ: John Wiley. See p.112.

        Smith, A. 2022. How Charts Work: Understand and Explain Data with Confidence. Harlow: Pearson. See pp.163-164.

        Spear, M. E. 1952. Charting Techniques. New York: McGraw-Hill.

        Spear, M. E. 1969. Practical Charting Techniques. New York: McGraw-Hill.

        Stouffer, S. A., E. A. Suchman, L. C. DeVinney, S. A. Star, and R. M. Williams, Jr. 1949a. The American Soldier: Adjustment During Army Life. Princeton, NJ:
        Princeton University Press.

        Stouffer, S. A., A. A. Lumsdaine, M. H. Lumsdaine, R. M. Williams, Jr., M. B. Smith, I. L. Janis, S. A. Star, and L. S. Cottrell. 1949b. The American Soldier:
        Combat and its Aftermath. Princeton, NJ: Princeton University Press.

        Wilkinson, L., G. Blank and C. Gruber. 1996. Desktop Data Analysis using SYSTAT. Upper Saddle River, NJ: Prentice-Hall. See pp. 748-749.

        Comment


        • #5
          More fool I for forgetting the humble stacked bar chart!

          The floating bar plot is an interesting development. You mention that one drawback is not showing rare categories clearly. To this I would add two more potential drawbacks.

          First, that it (and stacked bar charts) depend more on color than do other representations. This is not a drawback in all circumstances, but is may be challenging in some media, especially if there are a few more "ranks" than listed here.

          Second, that it seems very suitable for divergent scales, or other scales with a natural center or hinge value. It does look marvelous for Likert scales. But it feels somewhat arbitrary to set "rank 4" as the central value.

          While not represented in my sample dataset, it also probably doesn't perform quite as well if the categories are sufficiently disjoint. For example, if everyone ranks Apples first and Bananas second, and the interesting heterogeneity comes after that, then the "Apple" and "Banana" bars will look similar but for color.

          None of these are dealbreakers, so floating bar plots are a good arrow to add to the quiver. Thanks.

          I have added some interpretations of them for comparison. As you show, labels would surely improve some of these plots, but I'm just focusing on the general shapes right now.

          It occurs to me that I wish I had constructed the example dataset with a "divisive" category -- a fruit that most people either loved or hated. That would highlight some differences in these depictions.


          Click image for larger version

Name:	ranking graphs.png
Views:	1
Size:	293.0 KB
ID:	1775225

          Comment


          • #6
            I agree mostly. But if 4 is not a natural or at least convenient reference level for ranks that must be 1 to 7, what is?

            For your example data, I most like some kind of histogram. And I would be remiss if I didn't underline that your last example in #1 is just a histogram, and as such obtainable with official commands.

            Code:
            * Simulate data
            clear
            set seed 123456
            lab def fruit 1 "Apples" 2 "Bananas" 3 "Oranges" 4 "Grapes" 5 "Pears" 6 "Peaches" 7 "Figs"
            set obs 7
            gen byte fruit = _n
            lab val fruit fruit
            expand 1000
            sort fruit
            gen id = 1 + mod(_n,1000)
            gen rand = rnormal(0,2) + fruit
            bysort id (rand) : gen rank = _n
            lab var rank "Rank"
            drop rand
            
            capture set scheme stcolor 
            
            local this = cond(c(version) >= 18, "stc1", "blue")
            
            histogram rank, discrete frequency yla(1/7) xsc(r(0 600)) xla(0(250)500) bfcolor(`this'*0.5) blcolor(`this'*2) by(fruit, compact row(1) note("")) horizontal

            Click image for larger version

Name:	ranks.png
Views:	1
Size:	41.1 KB
ID:	1775229

            Comment


            • #7
              But if 4 is not a natural or at least convenient reference level for ranks that must be 1 to 7, what is?
              Well, I don't think there is a good reference level for ordinal data, at least not in the general case. The middle rank is as good as any without further context, I suppose.

              -histogram, by() horizontal- is indeed more elegant that what I did. Thanks.

              Comment


              • #8
                By the way, if you try to addlabels to that histogram it won't turn out right. I've reported the bug and tech support has acknowledged it.

                Code:
                sysuse auto, clear
                histogram mpg, by(foreign) horizontal addlabels
                Click image for larger version

Name:	bug.png
Views:	1
Size:	173.9 KB
ID:	1775316

                Comment

                Working...
                X