Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tabplot updated on SSC

    Thanks as always to Kit Baum, the package tabplot on SSC has been
    updated with new ado and help files for that program, which goes back to
    1999. Stata 8 is required. tabplot is billed as supporting one-, two-
    and three-way bar charts for tables, which understates its possibilities
    a little, but the whole story need not be given here.

    "Multiple bar charts" would be a good umbrella term, except for the need
    to explain that doesn't mean stacked or divided bars and it doesn't mean
    bars side by side on the same axis (and except for the puzzle that a
    single bar would just get lonely, so don't all bar charts have multiple
    bars?). (A single bar does not mean a "singles bar".)

    The update in code fixes some awkward, indeed deficient, parsing of
    calls to the by() option, which ruled out adjustment of a note() call
    together with the by() option.

    A bigger deal by comparison is much re-writing of the help file, with
    restructured explanation of syntax, better-explained and more numerous
    examples, and many more references since the last update several months
    ago.

    If interested, then use

    Code:
     
    ssc inst tabplot
    to install afresh or

    Code:
     
    ssc inst tabplot, replace
    to update an existing installation; some readers may be using

    Code:
     
    adoupdate
    instead.

    Bar charts are basic, and may seem very well supported in Stata, as only
    a little acquaintance with the documentation reveals four commands,
    graph bar, graph hbar, twoway bar and twoway rbar, which might seem
    already three more than one might need.

    Another command for bar charts (or more; I have others) thus needs a
    little explanation. This one is itself just a wrapper for twoway rbar,
    but it can do various plots more easily than you could do yourself,
    unless you were willing to do a little programming and a lot of fiddling
    around.

    The main conceit of tabplot is table-like plots. The name is intended to
    evoke commands like tabulate with their structured output of tables in
    rows and columns.

    Incidentally, I note that there is a tabplot package for R with its main
    command tableplot; an old Stata command of mine called tableplot also
    exists on SSC, but its main capabilities have long since been folded
    into tabplot. I don't doubt that tabplot on R is good, but I've never
    used it or studied its documentation closely. I am pretty sure that I
    used the name first, not that I mind so long as the name remains
    distinct within Stata.

    Clearly the help file is there with the details you are expected to
    want, so the best I can now do for anyone curious is to give a couple of
    self-contained examples, together with a moderate sales pitch.

    Other applications of tabplot can be found at

    http://www.statalist.org/forums/foru...-and-subgraphs

    http://www.statalist.org/forums/foru...something-else

    http://www.statalist.org/forums/foru...d-with-grc1leg

    http://www.statalist.org/forums/foru...lot-or-tabplot

    http://stats.stackexchange.com/quest...inal-variables

    http://stats.stackexchange.com/quest...ical-variables

    Greenacre (2007, p.42; full reference below) gave these data from the
    Encuesta Nacional de la Salud (Spanish National Health Survey), 1997.
    They are interesting in themselves, but for my purposes they are useful
    as an example large enough to be challenging. As with many tables, the
    main handle for understanding is to look at the probability distribution
    of the response health given the predictor age. tabplot offers options
    to calculate percent or proportional/fractional breakdowns on the fly.
    Aesthetic preferences or conventions often encourage presentation in
    terms of percents. ("Percentage" seems to me too long a word, whatever
    dictionaries may say.)

    Code:
     
    clear
    input byte(agegroup health) long freq
    1 1 243
    1 2 789
    1 3 167
    1 4 18
    1 5 6
    2 1 220
    2 2 809
    2 3 164
    2 4 35
    2 5 6
    3 1 147
    3 2 658
    3 3 181
    3 4 41
    3 5 8
    4 1 90
    4 2 469
    4 3 236
    4 4 50
    4 5 16
    5 1 53
    5 2 414
    5 3 306
    5 4 106
    5 5 30
    6 1 44
    6 2 267
    6 3 284
    6 4 98
    6 5 20
    7 1 20
    7 2 136
    7 3 157
    7 4 66
    7 5 17
    end
    label values agegroup agegroup
    label def agegroup 1 "16-24", modify
    label def agegroup 2 "25-34", modify
    label def agegroup 3 "35-44", modify
    label def agegroup 4 "45-54", modify
    label def agegroup 5 "55-64", modify
    label def agegroup 6 "65-74", modify
    label def agegroup 7 "75+", modify
    label values health health
    label def health 1 "very good", modify
    label def health 2 "good", modify
    label def health 3 "regular", modify
    label def health 4 "bad", modify
    label def health 5 "very bad", modify
    
    tabplot health agegroup [w=freq] , percent(agegroup) showval subtitle(% of age group) xtitle("") bfcolor(none)
    Click image for larger version

Name:	tabplot_sl_1.png
Views:	1
Size:	39.9 KB
ID:	1335275


    What particularly bites here are some very small percents, which are
    perfectly credible and not at all unusual for such data. A merit of the
    multiple bar charts design is that small values are discernible as such.
    Note especially the showval option, which insists on showing values too.

    The graph thus deliberately uses table ideas and graph ideas together.
    Sometimes people say to me, "But you shouldn't do that!" and some
    prohibition emerges that graphs are graphs and tables and tables, and
    ne'er the twain shall meet, which seems to me no more than superstition.

    Digression. An intriguing suggestion, which I have borrowed elsewhere,
    is that the conventional distinction between graphs and tables was a
    side-effect of the development of printing. Before printing there were
    manuscripts -- those scripted manually, or written by hand -- to which
    writers could add illustrations, say of knights, or dragons, or of
    sinners being tormented, or something equally entertaining, as they
    liked and where they liked. Printed documents encouraged, or even
    enforced, a division of labour between typesetters and those who
    prepared illustrations. But now that's obsolete.

    A detailed objection to numeric values too is that they clutter up the
    graph, to which the answers are it depends on how you do it, and if
    you strongly object it's not compulsory. But tabplot gives up on
    labelling axes with bar magnitudes, so that reduces clutter too.

    Given this dataset, how else would you represent the patterns
    graphically? Setting aside any temptation to draw multiple pie charts,
    one alternative is a stacked bar chart:

    Code:
     
    * ssc inst catplot needed before 
    catplot health agegroup [w=freq], percent(agegroup) asyvars stack subtitle(% of age group)
    In recent Stata versions, graph hbar could also do this directly, but the syntax
    differs.

    Click image for larger version

Name:	tabplot_sl_2.png
Views:	1
Size:	12.5 KB
ID:	1335276


    I have not tried to hard to optimise this: the colour scheme and legend both need work,
    and so forth. Some would prefer vertical bars here.

    The key point is whether it could be made better (clearer, more effective,
    more attractive) than the previous graph. I note three key issues:

    1. Stacking is a well-understood design but very small amounts are hard to work
    to discern.

    2. A legend necessarily springs into being, but a legend obliges mental "back
    and forth" from readers (or else readers give up on looking at the detail).

    3. The program would let you add numeric values on top of the bars, but that would
    be at least a little messy.

    Naturally this is a straw graph that I set up to knock down again, but are there good
    alternatives? I've had better results with unstacked bars for this example, but I
    will move on.

    Let's look at graphs for a three-way table.

    Aitkin et al. (1989, p.242; full reference below) reported data from a
    survey of student opinion on the Vietnam War taken at the University of
    North Carolina in Chapel Hill in May 1967. Students were classified by
    sex, year of study, and the policy they supported, given choices of

    A. The United States should defeat the power of North Vietnam by
    widespread bombing of its industries, ports, and harbors and by land
    invasion.

    B. The United States should follow the present policy in Vietnam.

    C. The United States should de-escalate its military activity, stop
    bombing North Vietnam, and intensify its efforts to begin negotiation.

    D. The United States should withdraw its military forces from Vietnam
    immediately.

    The labels A ... D are fairly dopey, but even at this distance
    suggesting better ones might be thought contentious politically, so I
    will desist.

    Code:
     
    clear
    input str6 sex str8 year str1 policy int freq
    "male" "1" "A" 175
    "male" "1" "B" 116
    "male" "1" "C" 131
    "male" "1" "D" 17
    "male" "2" "A" 160
    "male" "2" "B" 126
    "male" "2" "C" 135
    "male" "2" "D" 21
    "male" "3" "A" 132
    "male" "3" "B" 120
    "male" "3" "C" 154
    "male" "3" "D" 29
    "male" "4" "A" 145
    "male" "4" "B" 95
    "male" "4" "C" 185
    "male" "4" "D" 44
    "male" "Graduate" "A" 118
    "male" "Graduate" "B" 176
    "male" "Graduate" "C" 345
    "male" "Graduate" "D" 141
    "female" "1" "A" 13
    "female" "1" "B" 19
    "female" "1" "C" 40
    "female" "1" "D" 5
    "female" "2" "A" 5
    "female" "2" "B" 9
    "female" "2" "C" 33
    "female" "2" "D" 3
    "female" "3" "A" 22
    "female" "3" "B" 29
    "female" "3" "C" 110
    "female" "3" "D" 6
    "female" "4" "A" 12
    "female" "4" "B" 21
    "female" "4" "C" 58
    "female" "4" "D" 10
    "female" "Graduate" "A" 19
    "female" "Graduate" "B" 27
    "female" "Graduate" "C" 128
    "female" "Graduate" "D" 13
    end
    
    tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval
    Click image for larger version

Name:	tabplot_sl_3.png
Views:	1
Size:	16.7 KB
ID:	1335277

    The way to plot three-way tables is unsurprisingly by using a by() option to repeat two-way tables.
    The syntax for tabplot matches standard conventions such that (as in regress and scatter, for
    example) it is usually best to mention the response or outcome variable first (as defining rows of
    the plot, and as to be shown on the y axis). There can be trade-offs or compromises,
    as no layout is best for all purposes, but big differences can safely be put at a distance (so
    males and females here differ markedly in their mix of views), while finer distinctions are
    easier to make if bars are close. On top of all that, any ordinal scales should naturally be
    respected as such.

    Aitkin, M., D. Anderson, B. Francis, and J. Hinde. 1989. Statistical
    Modelling in GLIM. Oxford: Oxford University Press


    Greenacre, M. 2007. Correspondence analysis in practice. Boca Raton, FL:
    Chapman & Hall/CRC



  • #2
    For comparison, here is a spineplot (many people say "mosaic plot") for the first dataset. The program should be downloaded from the Stata Journal site, except that I am cheating in using an updated version not yet publicly available.

    Code:
    spineplot health age [w=freq], bar1(color(gs4)) bar2(color(gs8))  bar3(color(blue*0.2)) bar4(color(blue*0.6)) bar5(color(blue))  xla(, labsize(*0.8) axis(2)) percent xla(0(20)100) yla(0(20)100, axis(2))
    Click image for larger version

Name:	spineplot.png
Views:	1
Size:	20.5 KB
ID:	1335668


    I worked harder on the colour scheme than on the corresponding stacked bar chart.

    This does a better job at showing the differences in age group frequencies than any other design shown because they ignore that.

    The overall pattern of change comes over quite well.

    Comment


    • #3
      I tried a bit harder with the stacked bar chart.

      Code:
      graph bar (count) [fw=freq], over(health, descending) over(agegroup) percent subtitle(% of age group) stack asyvars bar(5, bfcolor(red*0.8)) bar(4, bfcolor(red*0.3) blcolor(red*0.8)) bar(3, bfcolor(blue*0.2) blcolor(blue*1.2)) bar(2, bfcolor(blue*0.7) blcolor(blue*1.2)) bar(1, bcolor(blue*1.2)) legend(pos(3) col(1)) ysc(r(-5 100)) yla(, ang(h))
      This syntax requires Stata 14 or Stata13 with updates at least to 9 October 2014.

      Click image for larger version

Name:	stackedbar.png
Views:	1
Size:	42.8 KB
ID:	1337387

      Comment


      • #4
        Now written up at http://www.stata-journal.com/article...article=gr0066

        Comment


        • #5
          Dear my respected Nick Cox, Thank you very much!! I have learned a lot from your daily posts. I wish you have an endless happiness and success in your life!!
          Respectfully, Hassen

          Comment


          • #6
            Hassen: Thanks for those kind words, which I much appreciate.

            Comment


            • #7
              Nick Cox , I was desperately looking for a feasible solution to graph the relationship between an ordinal response and ordinal predictor variable. Till today I did't have a satisfactory solution. However, after reading this post and the article you linked, I found something I liked. Thank you so much for your help!

              Comment


              • #8
                Jonas Jakobi Excellent! Thanks for writing in.

                Comment


                • #9
                  Thanks to Kit Baum, version 2.8.0 of tabplot has been posted on SSC. For the moment, this is the most up-to-date public version. The most notable change is the addition of a frame() option, illustrated here:

                  Click image for larger version

Name:	tabplot_frame2.png
Views:	1
Size:	30.6 KB
ID:	1507783



                  Each bar is shown framed. Here's sample data and code to make it reproducible:


                  Code:
                  clear
                  input str6 sex str8 year str1 policy int freq
                  "male" "1" "A" 175
                  "male" "1" "B" 116
                  "male" "1" "C" 131
                  "male" "1" "D" 17
                  "male" "2" "A" 160
                  "male" "2" "B" 126
                  "male" "2" "C" 135
                  "male" "2" "D" 21
                  "male" "3" "A" 132
                  "male" "3" "B" 120
                  "male" "3" "C" 154
                  "male" "3" "D" 29
                  "male" "4" "A" 145
                  "male" "4" "B" 95
                  "male" "4" "C" 185
                  "male" "4" "D" 44
                  "male" "Graduate" "A" 118
                  "male" "Graduate" "B" 176
                  "male" "Graduate" "C" 345
                  "male" "Graduate" "D" 141
                  "female" "1" "A" 13
                  "female" "1" "B" 19
                  "female" "1" "C" 40
                  "female" "1" "D" 5
                  "female" "2" "A" 5
                  "female" "2" "B" 9
                  "female" "2" "C" 33
                  "female" "2" "D" 3
                  "female" "3" "A" 22
                  "female" "3" "B" 29
                  "female" "3" "C" 110
                  "female" "3" "D" 6
                  "female" "4" "A" 12
                  "female" "4" "B" 21
                  "female" "4" "C" 58
                  "female" "4" "D" 10
                  "female" "Graduate" "A" 19
                  "female" "Graduate" "B" 27
                  "female" "Graduate" "C" 128
                  "female" "Graduate" "D" 13
                  end
                  set scheme s1color
                  
                  tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval name(G1)
                  tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval frame(100) name(G2)
                  Notice the frame(100) option on the second version (shown above).

                  Beyond that, the help file continues to grow quietly, with extra references as I find them.

                  Comment


                  • #10
                    Dear Nick, Thanks for this extra interesting feature.
                    Ho-Chuan (River) Huang
                    Stata 17.0, MP(4)

                    Comment


                    • #11
                      Dear Nick, I am puzzled. In which repository do you update and maintain -tabplot-? I somehow was under the impression that you prefer the Stata Journal repository for -tabplot- but it looks like this update is only available at SSC, or?
                      At least the gr0066_1 (http://www.stata-journal.com/software/sj17-3) as not seen an update yet.

                      Comment


                      • #12
                        tabplot was maintained on SSC over most of its history until I wrote it up in the Stata Journal in 2016. Then I updated it in the same place in 2017. I will send another update to the Editors shortly but updates there are subject to a 3 month cycle.

                        Updates on SSC are subject to a delay of more like 3 hours or 3 days depending on Kit Baum's travels and how busy he is and the position of the Sun over Boston, MA. .

                        I hinted at tabplot 2.8.0 in a recent post here

                        https://www.statalist.org/forums/for...-subtitle-size

                        so I was minded to get an update out quickly on SSC for anybody who cared. Indeed, it was you who expressed interest in that, so there you go: the main reason I put this on SSC quickly is your own comment.
                        Last edited by Nick Cox; 19 Jul 2019, 03:31.

                        Comment


                        • #13
                          Nick, thank you very much. I switched to SSC. I appreciate your care about user feedback.

                          Comment


                          • #14
                            The update alluded to in #12 is forthcoming in Stata Journal 20(3) 2020. As you can tell, I didn't treat the task as urgent once the code was updated on SSC.

                            Comment


                            • #15
                              Following a bug report in https://www.statalist.org/forums/for...rcent-fraction a fixed version of tabplot (2.8.1) is now available at SSC. Thanks as usual to Kit Baum for prompt posting.

                              A formal update will follow in the Stata Journal.

                              The bug might bite you if you use set dp comma but if you use tabplot at all you would benefit also from an extended help file with further examples and references.

                              Comment

                              Working...
                              X