Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatterplot matrix with r values

    Dear all,

    Is there some easy way to display Pearson's r for each correlation in a
    scatterplot matrix? E.g. if I run the following code, is there some
    easy way to add the r values for each of the scatterplots?

    sysuse auto, clear
    graph mat mpg price weight length, ms(Oh)

    Any help would be greatly appreciated!

    All the best,
    Richard

  • #2
    I have the same question Richard Ohrvall asked 7 years ago! Are there any Stata packages that can produce a scatter-plot matrix something like this?





    Having 95% CIs for the correlations would be even better.

    Thanks,
    Bruce

    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Some way from that but perhaps little known is corrtable from SSC. https://www.stata.com/statalist/arch.../msg00978.html

      Comment


      • #4
        Thanks Nick Cox, I'll take a look at it. Meanwhile, I remembered that JASP can produce a matrix with all of the features I would like. E.g.,

        Click image for larger version

Name:	JASP_scatter_plot_matrix.png
Views:	1
Size:	938.3 KB
ID:	1715122


        But AFAIK, JASP still has no easy way to generate code, which is a big limitation for me.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          I can't comment on JASP.

          I did once write a Stata program to combine a set of scatter plots with a diagonal display of (IIRC) histograms, but handling all the details that need to be handled got too time-consuming and I lost enthusiasm for my own small project.

          A trope that had some circulation in visualization circles a while back ran that there are two kinds of positive reaction to graphs (or any other kind of visualization).

          Aha! means that you can see more clearly what you are looking for -- and even better that you see something interesting or important that you were not looking for.

          Wow! means how did you do that? That is amazing, incredible, spectacular -- choose your adjective(s). Speak now if you have no interest in impressing others with how clever you are, or with how clever the tools you know how to handle can be. (Me too.) However, I suggest that Wow! is always second best to Aha!

          (I wanted to add Huh? meaning what is being shown here and/or how are we supposed to learn from it? My usual reaction to word clouds, many tree maps, most network visualizations, some heat maps, just about all data art ... add your own favourite dislikes or non-likes.

          The displays in #2 and #4 shown certainly pack a lot of information into the space. I am not so impressed with having to puzzle out where the correlation is displayed once I see a scatter plot that looks interesting. I don't practise starring, as if we were reviewing restaurants or movies or items purchased or service quality. The correlation itself is the message. If someone were to write Stata commands to do that, I am sure that they would interest many users.

          Meanwhile, corrtable has much more limited and mundane aims.

          Here is a token example.

          Code:
          sysuse auto, clear
            
          set scheme s1color
          
          foreach v of var price-gear {
              clonevar `v'2 = `v'
              local label : var label `v'2
              if strpos("`label'", "(") local label = substr("`label'", 1, strpos("`label'", "(") - 2)
              label var `v'2 "`label'"
          }
          
          corrtable mpg2 gear_ratio2 rep782 price2 headroom2-displacement2, flag1(r(rho) > 0) howflag1(plotregion(color(blue * 0.1))) flag2(r(rho) < 0) howflag2(plotregion(color(pink*0.1))) half rsize(2 + 6 * abs(r(rho)))
          So, we are just using graph to show a table, but exploiting some scope to change font sizes, background colours and so forth. As with a correlation matrix, how useful the display can stem from carefully ordering the variables. As with any display, you can pack far too much in, with the result that most readers will think "I can come back to that later", except that they usually won't.
          Click image for larger version

Name:	corrtable.png
Views:	1
Size:	57.5 KB
ID:	1715155




          Last edited by Nick Cox; 27 May 2023, 04:33.

          Comment


          • #6
            Hi Nick. I certainly agree with you regarding Aha!, Wow! and Huh?. But clearly, I like the graphic in #4 more than you do. ;-)

            For context, I am working on some slides for a lecture on correlations of various types. I would like to impress upon students the importance of taking a look at scatter-plots corresponding to the Pearson r values they see in a typical correlation matrix. If they look only at the r values, they could fall prey to apparently strong correlations that are driven almost entirely by one outlier. The 4th of Anscombe's (1973) datasets comes to mind as a relevant example. (In the code below, I've added a little noise to the x-values for Anscombe's 4th dataset so that there is some variance in x when I exclude the outlier.) I do quite like the graphic in #5, but it would not allow one to spot the outlier, whereas the figure in #4 would.

            Code:
            * Example using a modification of
            * Anscombe's 4th dataset
            clear *
            input x1 y1    x2 y2 x3 y3    x4 y4
            10  8.04  10  9.14  10   7.46   8  6.58
            8   6.95   8  8.14   8   6.77   8  5.76
            13    7.58  13  8.74  13  12.74   8  7.71
            9   8.81   9  8.77   9   7.11   8  8.84
            11  8.33  11  9.26  11   7.81   8  8.47
            14  9.96  14  8.1   14   8.84   8  7.04
            6   7.24   6  6.13   6   6.08   8  5.25
            4   4.26   4  3.1    4   5.39  19  12.5
            12  10.84 12  9.13  12   8.15   8  5.56
            7   4.82   7  7.26   7   6.42   8  7.91
            5   5.68   5  4.74   5   5.73   8  6.89
            end
            
            * Let x4b = x4 with some random noise added
            set seed 230527
            generate x4b = x4 + rnormal(0,.5)
            * Correlation for x4b and y4
            // Type "corrci, sj" in the Command window if you need to install -corrci-
            corrci x4b y4
            twoway scatter y4 x4b || lfit y4 x4b
            * Get the correlation without the outlier
            egen x4bmax = max(x4b)
            corrci x4b y4 if x4b < x4bmax
            PS- Note the use of -corrci- (SJ). You may be familiar with it.
            --
            Bruce Weaver
            Email: [email protected]
            Version: Stata/MP 18.5 (Windows)

            Comment


            • #7
              Absolutely. corrtable does precisely nothing to help you spot any outliers, and there is no substitute for looking at the scatter plots.

              Your last two lines could be
              Code:
               
               su x4b, meanonly     
               corrci x4b y4 if x4b < r(max)
              See also crossplot from SSC. https://www.statalist.org/forums/for...ailable-on-ssc

              Comment

              Working...
              X