Thanks to Kit Baum as usual, a new program subsetplot is now available to download from SSC. Stata 8.2 is required.
subsetplot produces an array of scatter or other twoway plots for yvarlist versus xvar according to a further variable byvar. There is one plot for observations for each distinct subset of byvar in which data for that subset are highlighted and the rest of the data shown as backdrop. Graphs are drawn individually and then combined with graph combine.
That's a little abstract, but some examples should help. We all know that if you want to compare relationships graphically between groups of observations, we can superimpose different groups in a single plot, or juxtapose different groups in several plots. This is a hybrid approach combining elements of those two strategies. Consider this code:
Each subset is shown in turn with the rest of the data as backdrop. In the case of ordered categories such as repair record, each value could serve as its own symbol:
Here's one more example. With panel data in particular, the problem of spaghetti plots is pervasive across several fields. In principle, plotting several time series in one plot is showing all the information. In practice, it can be hard to see the trees for the wood, to change the metaphor.
This approach was discussed in Cox (2010). See also Schwabisch (2014) for an example. Readers knowing interesting or useful examples
or discussions, especially early in date or comprehensive in detail, are welcome to email the author. It's hard to believe that this simple idea doesn't go way back, but at present I lack the references.
Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.
Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.
subsetplot produces an array of scatter or other twoway plots for yvarlist versus xvar according to a further variable byvar. There is one plot for observations for each distinct subset of byvar in which data for that subset are highlighted and the rest of the data shown as backdrop. Graphs are drawn individually and then combined with graph combine.
That's a little abstract, but some examples should help. We all know that if you want to compare relationships graphically between groups of observations, we can superimpose different groups in a single plot, or juxtapose different groups in several plots. This is a hybrid approach combining elements of those two strategies. Consider this code:
Code:
set scheme s1color sysuse auto, clear subsetplot scatter mpg weight, subset(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78)
Each subset is shown in turn with the rest of the data as backdrop. In the case of ordered categories such as repair record, each value could serve as its own symbol:
Here's one more example. With panel data in particular, the problem of spaghetti plots is pervasive across several fields. In principle, plotting several time series in one plot is showing all the information. In practice, it can be hard to see the trees for the wood, to change the metaphor.
Code:
webuse grunfeld subsetplot line invest year, by(company) ysc(log) yla(1 10 100 1000)
This approach was discussed in Cox (2010). See also Schwabisch (2014) for an example. Readers knowing interesting or useful examples
or discussions, especially early in date or comprehensive in detail, are welcome to email the author. It's hard to believe that this simple idea doesn't go way back, but at present I lack the references.
Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.
Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.
Comment