subsetplot available on SSC

Nick Cox

Join Date: Mar 2014

Posts: 35697
#1

subsetplot available on SSC

29 Sep 2014, 05:30

Thanks to Kit Baum as usual, a new program subsetplot is now available to download from SSC. Stata 8.2 is required.

subsetplot produces an array of scatter or other twoway plots for yvarlist versus xvar according to a further variable byvar. There is one plot for observations for each distinct subset of byvar in which data for that subset are highlighted and the rest of the data shown as backdrop. Graphs are drawn individually and then combined with graph combine.

That's a little abstract, but some examples should help. We all know that if you want to compare relationships graphically between groups of observations, we can superimpose different groups in a single plot, or juxtapose different groups in several plots. This is a hybrid approach combining elements of those two strategies. Consider this code:

Code:

set scheme s1color sysuse auto, clear subsetplot scatter mpg weight, subset(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78)

Each subset is shown in turn with the rest of the data as backdrop. In the case of ordered categories such as repair record, each value could serve as its own symbol:

Here's one more example. With panel data in particular, the problem of spaghetti plots is pervasive across several fields. In principle, plotting several time series in one plot is showing all the information. In practice, it can be hard to see the trees for the wood, to change the metaphor.

Code:

webuse grunfeld subsetplot line invest year, by(company) ysc(log) yla(1 10 100 1000)

This approach was discussed in Cox (2010). See also Schwabisch (2014) for an example. Readers knowing interesting or useful examples
or discussions, especially early in date or comprehensive in detail, are welcome to email the author. It's hard to believe that this simple idea doesn't go way back, but at present I lack the references.

Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

Attached Files
Tags: None

6 likes
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#2

29 Sep 2014, 05:54

Seems wonderful way of visualizing data. I tried to install it from ssc, but no luck. Am I missing something?

ssc install subsetplot

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#3

29 Sep 2014, 05:57

Great useful stuffs. Many thanks to Nick and Kit Baum. Just to let you know, -ssc describe s- is not showing ''subsetplot'' in the list of the programs yet.

Roman
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

29 Sep 2014, 05:59

Thanks for the report. There's a small temporary glitch. The .ado and the .sthlp are there. but not yet a package file. You can copy the files across to a directory or folder of your choice by using ssc copy, or wait for the glitch to be fixed. I'll alert Kit Baum.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#5

29 Sep 2014, 09:22

Glitch now fixed, thanks to Kit. ssc install subsetplot should work (so long as you have sufficient access to the internet, naturally).
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#6

29 Sep 2014, 11:48

That's working perfect. Thanks Nick and Kit. Just a query. In the spaghetti plot above, the orange line refers to company specific line for investment over years. But what those grey lines refer to? Is there any way to skip them?

Roman
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#7

29 Sep 2014, 11:54

The entire rationale of subsetplot is to include the rest of the data as backdrop, in this case as a set of grey lines!!! If you don't want that, just use some appropriate official command, e.g line with a by() option, as documented.
1 like
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#8

29 Sep 2014, 12:05

Actually, this is cleverer than I thought. Brilliant !!

Roman
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#9

29 Sep 2014, 12:21

Thanks.
Comment
Stefan Gawrich

Join Date: Jun 2014

Posts: 6
#10

30 Sep 2014, 01:32

Thanks Nick,

this is very nice new graph!

Unfortunately I encounter some problems with value labels of the by() var. (Stata 13.1 MP on Win7)

1) A left parenthesis in a value label within the first 32 chars of a value label without a right parenthesis within 32 chars leads to an error.

Example:

sysuse auto, clear
label define foreignlabel3 0 "1 10 20 (manufacturer)" 1 "foreign"
label values foreign foreignlabel3
subsetplot scatter price mpg,by(foreign)

parentheses do not balance
r(198);

2) Also the use of a comma seems to be misinterpreted.

sysuse auto, clear
label define foreignlabel3 0 "Detroit, Michigan" 1 "foreign"
label values foreign foreignlabel3
subsetplot scatter price mpg,by(foreign)

option Michigan not allowed
r(198);

Best wishes

Stefan Gawrich
Dillenburg
Germany
Comment
Stefan Gawrich

Join Date: Jun 2014

Posts: 6
#11

30 Sep 2014, 01:45

The forum -itrim-s text so the first example of my last post worked.

Here's an altered example:

sysuse auto, clear
label define foreignlabel3 0 "1________10________20__(manufacturer)" 1 "foreign", replace
label values foreign foreignlabel3
subsetplot scatter price mpg,by(foreign)

parentheses do not balance
r(198);

Best wishes

Stefan Gawrich
Dillenburg
Germany
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#12

30 Sep 2014, 02:10

Stefan: Thanks for your interest. I can reproduce problem 2 but not (yet?) problem 1. You have unearthed a small bug. I will flag when a fixed version is posted on SSC.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3849
#13

30 Sep 2014, 02:21

Guess both are caused by line 84 of subsetplot.ado which calls the subtitle option as

Code:

... subtitle(`which')

This should be an easy fix, and I would suggest

Code:

... subtitle(`"`macval(which)'"')

because macval() is a trick to also deal with single (unmatched) left quotes in labels. Something that cannot be achieved with compound quotes only.

By the way, very nice program, Nick. Always happy to read your code for graphic commands, to get and learn from the ideas/technique behind.

Best
Daniel
2 likes
Comment
Stefan Gawrich

Join Date: Jun 2014

Posts: 6
#14

30 Sep 2014, 03:00

Thanks Daniel,

it works.
I should have looked into the code before posting.

Thanks again, Nick. I especially like -subsetplot- with line graphs. Very nice.

Stefan Gawrich
Dillenburg
Germany
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#15

30 Sep 2014, 03:10

Daniel: I agree with your diagnosis that the subtitle() option needs a fix. I am not going necessarily going to fix it in exactly the same way!
1 like
Comment

Announcement

subsetplot available on SSC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment