Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • twoway connected with markers for a subset of points

    Hello Statalist,

    I am working on a twoway connected graph for which I would like to include markers for only a subset of points.

    There are about 100 observations in my dataset, so allowing markers at all points is too busy, while plotting only a subset of the points reduces the smoothness of the graph lines. I can combine a connected and a scatter plot, but then the legend does not combine the line with the marker.

    Any advice welcome, code and results follow.

    Thanks,
    Raymond

    1. Markers for all points

    Code:
    twoway connected all_d realpce_usd_ppp_grid, ///
      xscale(log) xlab(1.25 2.5 5 10 20)
    Result:


    Click image for larger version

Name:	all_markers.png
Views:	1
Size:	176.0 KB
ID:	1767893

    2. Plot only every 10th point

    Code:
    twoway connected all_d realpce_usd_ppp_grid ///
      if mod(_n-1,10)==0, ///
      xscale(log) xlab(1.25 2.5 5 10 20)

    Result:
    Click image for larger version

Name:	tenth_markers.png
Views:	1
Size:	167.3 KB
ID:	1767894

    3. Combine twoway with scatter

    Code:
    twoway (connected all_d realpce_usd_ppp_grid, msymbol(i)) /// 
      || (scatter all_d realpce_usd_ppp_grid if mod(_n-1,10)==0, mcolor(stc1)) /// 
      , xscale(log) xlab(1.25 2.5 5 10 20)
    Result:
    Click image for larger version

Name:	combined_graphs.png
Views:	1
Size:	170.8 KB
ID:	1767895

  • #2
    Why do you think you need

    1. some points to be shown as points (I agree -- for your data showing them all would be excessive)

    2. a legend at all? (why not just explain in a text caption what you're doing?)

    Comment


    • #3
      The actual plot has 4 lines and I usually find using both lpattern and msymbol helps me understand what's what.


      Click image for larger version

Name:	pdf_realpce_usd_ppp_union.png
Views:	1
Size:	243.8 KB
ID:	1767898

      Comment


      • #4
        That context changes the question, as you do need to make it easy to compare all those curves.

        Thoughts on various levels:

        1. Kernel density estimation is data hungry. Your choice to work on log scale helps. I guess I would always want to see the raw data too (e.g. in a quantile plot). A quantile plot is a good idea any way, but no official command is quite as versatile as qplot from the Stata Journal.

        2. You could take one distribution as reference and plot the others as deviations from that. Naturally the kernel density estimates would then need to be for the same points (as well as the same kernel type and width).

        3. A front-and-back plot might help. https://www.statalist.org/forums/for...ailable-on-ssc is a long thread but skim and skip.

        Code:
         search fabplot, sj
        
        Search of official help files, FAQs, Examples, and Stata Journals
        
        SJ-21-2 gr0087  . . Front-and-back plots to ease spaghetti and paella problems
                (help fabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/21   SJ 21(2):539--554
                explores front-and-back plots, in which each subset of data
                is shown separately with the other subsets as backdrop
        is the main story in fairly coherent form.

        Comment


        • #5
          In case future Listers have interest, I found an extremely silly solution:

          Code:
          twoway ///
            (connected target_d realpce_usd_ppp_grid if _n==1)  /// 
            || //////
            (connected target_d realpce_usd_ppp_grid, lcolor(stc1) msymbol(i)) /// 
            || ///
            (scatter target_d realpce_usd_ppp_grid if mod(_n-1,10)==0, mcolor(stc1)) /// 
            , xscale(log) xlab(1.25 2.5 5 10 20) legend(order(1))
          Click image for larger version

Name:	hacktackular.png
Views:	1
Size:	104.6 KB
ID:	1767976

          Comment

          Working...
          X