designplot now available from SSC (something also for fans of descriptive tables)

Nick Cox

Join Date: Mar 2014

Posts: 35698
#1

designplot now available from SSC (something also for fans of descriptive tables)

16 May 2014, 10:23

Thanks as usual to Kit Baum, a new package designplot is now available from SSC. Stata 8.2 is required.

The name of the program may mean little or nothing to people. What's a design plot? The problem bites backwards more than forwards. Sometimes simple plots don't really need names in your papers and presentations: you just write or say "plotting something versus something else". It's almost an accident if a plot has a standard name (histogram, scatter plot, box plot) and often standard names are less than standard (what's a dot plot, and do you call it something else?). But a programmer writing a program must give it a name, and I chose designplot because "design plot" is a name in the literature. However, the design plots in the literature don't bear very much resemblance to the results of designplot.

But let's curtail that dogged discussion (there's more in the help for those so inclined).

Here's an example straight away:

Code:

sysuse auto set scheme s1color designplot mpg foreign rep78

The main idea is

1. You name a response and at least one predictor.

2. The graph shows summarize results for the response given the distinct levels of the predictors and their cross-combinations.

3. The default is just the mean, but one or more results can be shown.

4. If you name (say) two predictors, you get the zero-way breakdown (no breakdown at all), both one-way breakdowns for each predictor and the two-way breakdown for both predictors combined. (You are asked to swallow the non-standard term "zero-way" as a modest extension of standard terminology.)

5. You can get less than #4 by restricting, e.g., to just the one-way breakdowns, or at most the one-way breakdowns.

6. graph dot is used by default, but you can invoke graph hbar (which often works well) or graph bar (which less often works well).

7. You can save the results graphed as a new dataset. This may help in tabulation or in preparing a new graph.

This works somewhat like the existing (and apparently rather neglected) grmeanby command and also a lot like graph dot used directly. But there are different twists. Otherwise the command would be pointless.

#7 is different over either. The scope for multiscale breakdowns is new over either. grmeanby is restricted to means or medians (although any competent user-programmer could clone it quickly to do otherwise).

Here is another simple example. We will look at means and medians, sort within groups on means, add variable labels and restrict scope to zero- and one-way breakdowns.

Code:

designplot mpg foreign rep78, stat(median mean) variablelabels maxway(1) entryopts(sort(2) descending)

I would want to use the Graph Editor to tweak that, notably to tweak "Repair Record 1978" to two lines to take up less space, but that's always the sort of detail you want to improve.

Here is a variant on a common problem often tackled with tables. People are often interested in seeing various univariate breakdowns of frequencies for categorical variables. (To get percents, save the results as a dataset, do a simple calculation and call up graph again.)

Code:

designplot mpg foreign rep78 if !missing(foreign,rep78), stat(count) recast(hbar) blabel(total) yla(none) t1title("frequencies") variablelabels ytitle("") ysc(r(0 72))

One more example, assuming you're still reading. Looking at (one version of) the Titanic data, the focus is in variations of fraction survived as a response to age, sex, class and their interactions. The code is in the help file.

This kind of graph can be useful for description or exploration and perhaps even give you ideas about whether your models need interaction terms.

Attached Files
Tags: None

1 like
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

19 Aug 2014, 16:57

Thanks to Kit Baum as usual, this program has been updated on SSC. The update mostly concerns an amplified help file.
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#3

20 Aug 2014, 17:47

Thanks for that, Nick- it looks exceedingly useful. I always feel like to eat up too much time trying to kludge together plots to see what I want to see when doing EDA.

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

21 Aug 2014, 03:31

Although it's not mentioned in the Abstract, designplot was mentioned in my talk at the recent Boston meeting. Files are accessible from http://www.stata.com/meeting/boston14/abstracts/
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

04 Jan 2017, 11:33

Written up at http://www.stata-journal.com/article...article=gr0061
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#6

24 Sep 2020, 20:16

Originally posted by Nick Cox View Post

Thanks as usual to Kit Baum, a new package designplot is now available from SSC. Stata 8.2 is required.

The name of the program may mean little or nothing to people. What's a design plot? The problem bites backwards more than forwards. Sometimes simple plots don't really need names in your papers and presentations: you just write or say "plotting something versus something else". It's almost an accident if a plot has a standard name (histogram, scatter plot, box plot) and often standard names are less than standard (what's a dot plot, and do you call it something else?). But a programmer writing a program must give it a name, and I chose designplot because "design plot" is a name in the literature. However, the design plots in the literature don't bear very much resemblance to the results of designplot.

But let's curtail that dogged discussion (there's more in the help for those so inclined).

Here's an example straight away:

Code:

sysuse auto set scheme s1color designplot mpg foreign rep78

The main idea is

1. You name a response and at least one predictor.

2. The graph shows summarize results for the response given the distinct levels of the predictors and their cross-combinations.

3. The default is just the mean, but one or more results can be shown.

4. If you name (say) two predictors, you get the zero-way breakdown (no breakdown at all), both one-way breakdowns for each predictor and the two-way breakdown for both predictors combined. (You are asked to swallow the non-standard term "zero-way" as a modest extension of standard terminology.)

5. You can get less than #4 by restricting, e.g., to just the one-way breakdowns, or at most the one-way breakdowns.

6. graph dot is used by default, but you can invoke graph hbar (which often works well) or graph bar (which less often works well).

7. You can save the results graphed as a new dataset. This may help in tabulation or in preparing a new graph.

This works somewhat like the existing (and apparently rather neglected) grmeanby command and also a lot like graph dot used directly. But there are different twists. Otherwise the command would be pointless.

#7 is different over either. The scope for multiscale breakdowns is new over either. grmeanby is restricted to means or medians (although any competent user-programmer could clone it quickly to do otherwise).

Here is another simple example. We will look at means and medians, sort within groups on means, add variable labels and restrict scope to zero- and one-way breakdowns.

Code:

designplot mpg foreign rep78, stat(median mean) variablelabels maxway(1) entryopts(sort(2) descending)

I would want to use the Graph Editor to tweak that, notably to tweak "Repair Record 1978" to two lines to take up less space, but that's always the sort of detail you want to improve.

Here is a variant on a common problem often tackled with tables. People are often interested in seeing various univariate breakdowns of frequencies for categorical variables. (To get percents, save the results as a dataset, do a simple calculation and call up graph again.)

Code:

designplot mpg foreign rep78 if !missing(foreign,rep78), stat(count) recast(hbar) blabel(total) yla(none) t1title("frequencies") variablelabels ytitle("") ysc(r(0 72))

One more example, assuming you're still reading. Looking at (one version of) the Titanic data, the focus is in variations of fraction survived as a response to age, sex, class and their interactions. The code is in the help file.

This kind of graph can be useful for description or exploration and perhaps even give you ideas about whether your models need interaction terms.

This is really an excellent program in concept. It could have been more useful if the variable combinations (that appears under all the main ones) were removable as they altogether end up making one huge variable and make the graph unreadable (Age1, Age2.......... Ag10, Sex1, Sex2, Age1Sex1, Age1Sex2......Age10Sex2)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

25 Sep 2020, 01:34

Thanks for #6. In British English, at least, and perhaps more widely, it's proverbial that you can't fit a quart into a pint pot: https://www.collinsdictionary.com/di...nto-a-pint-pot (for the information of people blessed with purely metric systems,of units: one quart = 2 pints). It's tacit in #1 that this will work well with up to about 30 entries, beyond that you will struggle, although changing the graph size can help.

Beyond that I can tell that you are using the oldest version, as the ? in the left margin are a side-effect of Stata's support for Unicode, which broke a trick used in the original version. A search reveals that the latest version should be downloaded from the files for Stata Journal 19(3):

-----------------------------------------------------------------------------------------------------------------------
search for designplot (manual: [R] search)
-----------------------------------------------------------------------------------------------------------------------

Search of official help files, FAQs, Examples, and Stata Journals

SJ-19-3 gr0061_3 . . . . . . . . . . . . . . . Software update for designplot
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q3/19 SJ 19(3):748--751
any attempt to use the missing option of graph dot,
graph hbar, or graph bar is now ignored and advice on
what to do instead is shown

SJ-17-3 gr0061_2 . . . . . . . . . . . . . . . Software update for designplot
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):779
help file updated

SJ-15-2 gr0061_1 . . . . . . . . . . . . . . . Software update for designplot
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q2/15 SJ 15(2):605--606
bug fixed for Stata 14

SJ-14-4 gr0061 Design plots for graphical summary of a response given factors
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q4/14 SJ 14(4):975--990
produces a graphical summary of a numeric response variable
given one or more factors

[...]

designplot from http://fmwww.bc.edu/RePEc/bocode/d
'DESIGNPLOT': module to produce a graphical summary of response given one
or more factors / designplot produces a graphical summary of a numeric
response / variable given one or more "factors", "factor" here meaning any
/ numeric or string variable treated in terms of its distinct / levels in

To help more, I need to see minimally the command you issued and ideally an equivalent data example.
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#8

25 Sep 2020, 07:45

Originally posted by Nick Cox View Post

Thanks for #6. In British English, at least, and perhaps more widely, it's proverbial that you can't fit a quart into a pint pot: https://www.collinsdictionary.com/di...nto-a-pint-pot (for the information of people blessed with purely metric systems,of units: one quart = 2 pints). It's tacit in #1 that this will work well with up to about 30 entries, beyond that you will struggle, although changing the graph size can help.

Beyond that I can tell that you are using the oldest version, as the ? in the left margin are a side-effect of Stata's support for Unicode, which broke a trick used in the original version. A search reveals that the latest version should be downloaded from the files for Stata Journal 19(3):

-----------------------------------------------------------------------------------------------------------------------
search for designplot (manual: [R] search)
-----------------------------------------------------------------------------------------------------------------------

Search of official help files, FAQs, Examples, and Stata Journals

SJ-19-3 gr0061_3 . . . . . . . . . . . . . . . Software update for designplot
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q3/19 SJ 19(3):748--751
any attempt to use the missing option of graph dot,
graph hbar, or graph bar is now ignored and advice on
what to do instead is shown

SJ-17-3 gr0061_2 . . . . . . . . . . . . . . . Software update for designplot
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):779
help file updated

SJ-15-2 gr0061_1 . . . . . . . . . . . . . . . Software update for designplot
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q2/15 SJ 15(2):605--606
bug fixed for Stata 14

SJ-14-4 gr0061 Design plots for graphical summary of a response given factors
(help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
Q4/14 SJ 14(4):975--990
produces a graphical summary of a numeric response variable
given one or more factors

[...]

designplot from http://fmwww.bc.edu/RePEc/bocode/d
'DESIGNPLOT': module to produce a graphical summary of response given one
or more factors / designplot produces a graphical summary of a numeric
response / variable given one or more "factors", "factor" here meaning any
/ numeric or string variable treated in terms of its distinct / levels in

To help more, I need to see minimally the command you issued and ideally an equivalent data example.

Thank you for the updated version.
Here is the code designplot

Code:

designplot Age - DivorceEver , recast(hbar)

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(Age Sex Education MarriedEver EthnicMinority DivorceEver) 2 1 5 2 2 2 2 2 6 2 2 2 6 1 2 2 2 1 3 2 2 1 2 2 8 2 1 2 2 2 7 1 6 1 2 2 6 2 6 1 2 2 6 1 7 1 2 1 6 2 3 1 2 2 5 2 4 1 2 2 1 2 4 2 2 2 7 1 1 1 2 2 5 2 3 1 2 2 5 1 7 2 2 2 5 1 7 1 2 2 5 1 7 1 2 2 4 1 3 1 2 1 2 1 4 2 2 2 3 1 4 1 2 2 5 2 4 1 2 2 2 2 4 1 2 2 3 2 4 1 2 2 3 1 3 1 2 1 3 2 5 2 2 2 3 2 5 2 2 2 4 1 3 2 2 2 3 1 5 1 2 1 2 2 4 2 2 2 4 1 7 1 2 1 3 1 4 2 2 2 3 1 2 1 2 2 6 2 3 1 2 2 4 1 6 2 2 2 6 2 6 1 1 2 5 2 2 1 1 2 5 2 2 1 2 2 3 1 6 1 2 2 5 1 2 1 2 2 7 1 3 1 2 2 1 1 2 2 2 2 2 1 6 2 2 2 7 1 1 1 2 2 1 1 2 2 2 2 6 2 5 2 1 2 6 1 4 1 2 2 3 1 3 2 2 2 3 2 6 1 2 1 8 2 1 1 2 2 4 1 3 1 2 2 2 2 3 2 2 2 8 1 3 1 2 1 5 1 6 1 2 1 4 2 5 1 2 2 2 2 5 2 2 2 3 2 2 1 1 2 7 1 5 2 2 2 3 2 3 2 2 2 7 1 7 1 2 1 2 2 2 2 2 2 3 2 1 2 1 2 3 2 7 1 2 2 7 1 4 1 2 1 5 2 7 1 2 2 6 2 1 1 2 2 4 1 2 1 2 1 2 2 2 2 2 2 7 2 4 1 2 2 3 1 3 2 2 2 3 2 4 2 1 2 8 1 1 1 2 2 8 2 7 2 2 2 3 1 2 1 2 2 4 2 4 1 2 2 6 2 4 1 1 1 3 2 5 1 2 2 1 2 2 2 2 2 4 2 7 1 2 1 3 2 7 1 2 2 5 2 2 1 2 1 7 2 1 1 2 1 5 2 2 1 2 2 3 1 3 1 2 2 4 1 5 1 1 1 6 2 2 1 2 2 5 2 7 1 2 2 4 2 1 1 2 2 6 1 3 1 2 2 7 1 2 1 1 1 3 1 2 1 2 2 7 1 4 1 2 1 4 2 7 2 2 2 4 1 3 2 2 2 6 1 3 2 2 2 2 1 1 2 2 2 5 2 5 1 2 2 2 2 6 2 .c 2 4 2 4 1 2 2 7 2 5 2 2 2 5 1 2 1 2 2 2 2 4 2 2 2 end

Last edited by Sonnen Blume; 25 Sep 2020, 07:49.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#9

25 Sep 2020, 08:28

Your syntax choice makes Age the response variable and the others predictors and plots means for many combinations of predictors. I am not a social scientist, but that doesn't seem to me to be a good idea.

Where you have a mix of variables -- some perhaps outcomes, the others predictors or either -- the help that designplot can give is as a kind of data overview that cuts out the need for multiple little tables or graphs. The trick here is to feed designplot a constant variable as outcome and then to ignore it.

Although the code here is fairly short, you may need a certain amount of fooling around before you get something you really like.

Code:

gen one = 1 set scheme s1color designplot one Age-DivorceEver , bar(1, blcolor(blue) bfcolor(blue*0.2)) stat(count) min(1) max(1) recast(hbar) variablenames t1title("") blabel(total) ysc(alt) entryopts(label(labsize(small)))

Notes:

You can show variable labels if you want, or otherwise improve the text explaining categories.

Given the bar labels, the axis labels and ticks may seem redundant, or conversely.

The trade-off between better explanations of categories and keeping the display uncluttered is obvious in principle and hard to optimise in practice.

Last edited by Nick Cox; 25 Sep 2020, 08:53.
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#10

25 Sep 2020, 09:33

Originally posted by Nick Cox View Post

Your syntax choice makes Age the response variable and the others predictors and plots means for many combinations of predictors. I am not a social scientist, but that doesn't seem to me to be a good idea.

Where you have a mix of variables -- some perhaps outcomes, the others predictors or either -- the help that designplot can give is as a kind of data overview that cuts out the need for multiple little tables or graphs. The trick here is to feed designplot a constant variable as outcome and then to ignore it.

Although the code here is fairly short, you may need a certain amount of fooling around before you get something you really like.

Code:

gen one = 1 set scheme s1color designplot one Age-DivorceEver , bar(1, blcolor(blue) bfcolor(blue*0.2)) stat(count) min(1) max(1) recast(hbar) variablenames t1title("") blabel(total) ysc(alt) entryopts(label(labsize(small)))

[ATTACH=CONFIG]n1574256[/ATTACH]

Notes:

You can show variable labels if you want, or otherwise improve the text explaining categories.

Given the bar labels, the axis labels and ticks may seem redundant, or conversely.

The trade-off between better explanations of categories and keeping the display uncluttered is obvious in principle and hard to optimise in practice.

This is wonderful. I didn't realise the first on the list is treated as a response variable.

So the

Code:

gen one=1

is doing the trick. Could you please tell a bit about how this works. My goal is to show the percentages instead of count (because some bars can look very long relative to others). In a previous thread, you gave a solution to that (https://www.statalist.org/forums/for...-on-designbars). This time it solves the cluttering issue, but removes the percent option.
Please give a reference to use of

Code:

gen one = 1

and

Code:

gen percent = 100/r(N)

commands, if available.

Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#11

25 Sep 2020, 10:58

Creating a response and then ignoring it is exemplified but not trumpeted in the 2014 paper.

Initialising 100 / sample size is documented in the latest (2019) public version of the help in response to a previous question by ... Sonnen Blume.
Comment

Announcement

designplot now available from SSC (something also for fans of descriptive tables)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment