Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Scatter Plots Overlaid by Group (Efficient Way)

    Hi all,

    I am looking for an efficient way to make scatter plots overlaid by a "group". For example, suppose you have:

    Code:
    set more off
    clear
    
    input y x str2 state
    1 2 "NJ"
    2 2.5 "NJ"
    3 4 "NJ"
    9 1 "NY"
    8 0 "NY"
    7 -1 "NY"
    2 3 "NH"
    3 4 "NH"
    5 6 "NH"
    end
    Essentially, what I want is the graph which results from

    Code:
    graph twoway (scatter y x if state == "NJ") ///
                 (scatter y x if state == "NY") ///
                 (scatter y x if state == "NH"), legend(order(1 "NJ" 2 "NY" 3 "NH"))
    However, my actual data has 50 states, and it seems like there should be a more efficient way then to write the line of code 50 times to overlay 50 scatter plots on each other.

    Would anyone know of an efficient way to overlay all scatterplots on one graph?

    Thanks!

    Vincent

  • #2
    I think the most efficient way that I know of is to use the separate command:

    Code:
    separate y, by(state) gen(DD)
    labvarch DD*, after(==) // From SSC
    graph twoway scatter DD* x
    drop DD*

    Comment


    • #3
      sepscatter (SSC) was written for this purpose. See this thread

      http://www.statalist.org/forums/foru...lable-from-ssc

      and search the forum for other mentions.

      Incidentally, the undocumented option veryshortlabel makes Dimitriy's labvarch fix unnecessary with separate. But sepscatter offers the whole shebang.

      In the specific case here, it is much, much better just to use marker labels. 50 "different" symbols for 50 states (what about DC?) is a graphic nightmare.
      Last edited by Nick Cox; 15 Apr 2015, 03:28.

      Comment


      • #4
        More at

        Cox, N.J. 2005. Classifying data points on scatter plots. Stata Journal 5: 604-606.
        http://www.stata-journal.com/sjpdf.h...iclenum=gr0023

        The example of 50 states of the US is a clearly a good one. Many people have such data, but there are sufficiently many states that it is a graphic challenge. A separate symbol for each state -- even if possible, as you would need to combine different markers and different marker colours -- would not only not work well; it would require an enormous legend. So, don't do that then.

        Here is an example of technique:

        Code:
         
        sysuse census, clear 
        gen ratio = divorce/marriage
        label var ratio "divorces per marriage"
        format medage %2.0f 
        scatter ratio medage , mla(state2) mlabpos(0) ms(none) name(sep1, replace) 
        scatter ratio medage if state2 != "NV" , mla(state2) mlabpos(0) ms(none) ///
        || scatter ratio medage if state2 == "NV", mla(state2) mlabpos(0) ms(none) mlabcolor(red) mlabsize(*1.5) name(sep2, replace) legend(off)
        Click image for larger version

Name:	sep1.png
Views:	1
Size:	13.1 KB
ID:	1290928



        Click image for larger version

Name:	sep2.png
Views:	1
Size:	13.1 KB
ID:	1290929






        Comment


        • #5
          Sorry for the late response, but thank you all for your help!

          Comment


          • #6
            Don't mean to bring up an old post but is there a way we can do sepscatter and connect scatter points by group? ie connect points that are from the same state?

            Comment


            • #7
              See the help. It's documented that you can recast() to other twoway types. If that does resolve the matter, please give a concrete example with data and code you tried.

              Comment

              Working...
              X