Scatterplot matrix with r values

Richard Ohrvall

Join Date: Apr 2016

Posts: 1
#1

Scatterplot matrix with r values

27 Apr 2016, 06:44

Dear all,

Is there some easy way to display Pearson's r for each correlation in a
scatterplot matrix? E.g. if I run the following code, is there some
easy way to add the r values for each of the scatterplots?

sysuse auto, clear
graph mat mpg price weight length, ms(Oh)

Any help would be greatly appreciated!

All the best,
Richard
Tags: graph
Bruce Weaver

Join Date: May 2014

Posts: 1119
#2

25 May 2023, 13:24

I have the same question Richard Ohrvall asked 7 years ago! Are there any Stata packages that can produce a scatter-plot matrix something like this?

Having 95% CIs for the correlations would be even better.

Thanks,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

26 May 2023, 12:28

Some way from that but perhaps little known is corrtable from SSC. https://www.stata.com/statalist/arch.../msg00978.html
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1119
#4

26 May 2023, 15:39

Thanks Nick Cox, I'll take a look at it. Meanwhile, I remembered that JASP can produce a matrix with all of the features I would like. E.g.,

But AFAIK, JASP still has no easy way to generate code, which is a big limitation for me.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#5

27 May 2023, 04:30

I can't comment on JASP.

I did once write a Stata program to combine a set of scatter plots with a diagonal display of (IIRC) histograms, but handling all the details that need to be handled got too time-consuming and I lost enthusiasm for my own small project.

A trope that had some circulation in visualization circles a while back ran that there are two kinds of positive reaction to graphs (or any other kind of visualization).

Aha! means that you can see more clearly what you are looking for -- and even better that you see something interesting or important that you were not looking for.

Wow! means how did you do that? That is amazing, incredible, spectacular -- choose your adjective(s). Speak now if you have no interest in impressing others with how clever you are, or with how clever the tools you know how to handle can be. (Me too.) However, I suggest that Wow! is always second best to Aha!

(I wanted to add Huh? meaning what is being shown here and/or how are we supposed to learn from it? My usual reaction to word clouds, many tree maps, most network visualizations, some heat maps, just about all data art ... add your own favourite dislikes or non-likes.

The displays in #2 and #4 shown certainly pack a lot of information into the space. I am not so impressed with having to puzzle out where the correlation is displayed once I see a scatter plot that looks interesting. I don't practise starring, as if we were reviewing restaurants or movies or items purchased or service quality. The correlation itself is the message. If someone were to write Stata commands to do that, I am sure that they would interest many users.

Meanwhile, corrtable has much more limited and mundane aims.

Here is a token example.

Code:

sysuse auto, clear set scheme s1color foreach v of var price-gear { clonevar `v'2 = `v' local label : var label `v'2 if strpos("`label'", "(") local label = substr("`label'", 1, strpos("`label'", "(") - 2) label var `v'2 "`label'" } corrtable mpg2 gear_ratio2 rep782 price2 headroom2-displacement2, flag1(r(rho) > 0) howflag1(plotregion(color(blue * 0.1))) flag2(r(rho) < 0) howflag2(plotregion(color(pink*0.1))) half rsize(2 + 6 * abs(r(rho)))

So, we are just using graph to show a table, but exploiting some scope to change font sizes, background colours and so forth. As with a correlation matrix, how useful the display can stem from carefully ordering the variables. As with any display, you can pack far too much in, with the result that most readers will think "I can come back to that later", except that they usually won't.

Last edited by Nick Cox; 27 May 2023, 04:33.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1119
#6

27 May 2023, 16:42

Hi Nick. I certainly agree with you regarding Aha!, Wow! and Huh?. But clearly, I like the graphic in #4 more than you do. ;-)

For context, I am working on some slides for a lecture on correlations of various types. I would like to impress upon students the importance of taking a look at scatter-plots corresponding to the Pearson r values they see in a typical correlation matrix. If they look only at the r values, they could fall prey to apparently strong correlations that are driven almost entirely by one outlier. The 4th of Anscombe's (1973) datasets comes to mind as a relevant example. (In the code below, I've added a little noise to the x-values for Anscombe's 4th dataset so that there is some variance in x when I exclude the outlier.) I do quite like the graphic in #5, but it would not allow one to spot the outlier, whereas the figure in #4 would.

Code:

* Example using a modification of * Anscombe's 4th dataset clear * input x1 y1 x2 y2 x3 y3 x4 y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 end * Let x4b = x4 with some random noise added set seed 230527 generate x4b = x4 + rnormal(0,.5) * Correlation for x4b and y4 // Type "corrci, sj" in the Command window if you need to install -corrci- corrci x4b y4 twoway scatter y4 x4b || lfit y4 x4b * Get the correlation without the outlier egen x4bmax = max(x4b) corrci x4b y4 if x4b < x4bmax

PS- Note the use of -corrci- (SJ). You may be familiar with it.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#7

28 May 2023, 01:52

Absolutely. corrtable does precisely nothing to help you spot any outliers, and there is no substitute for looking at the scatter plots.

Your last two lines could be

Code:

su x4b, meanonly corrci x4b y4 if x4b < r(max)

See also crossplot from SSC. https://www.statalist.org/forums/for...ailable-on-ssc
1 like
Comment

Announcement

Scatterplot matrix with r values

Comment

Comment

Comment

Comment

Comment

Comment