twoway connected with markers for a subset of points

Raymond Guiteras

Join Date: Sep 2022

Posts: 12
#1

twoway connected with markers for a subset of points

19 Nov 2024, 12:20

Hello Statalist,

I am working on a twoway connected graph for which I would like to include markers for only a subset of points.

There are about 100 observations in my dataset, so allowing markers at all points is too busy, while plotting only a subset of the points reduces the smoothness of the graph lines. I can combine a connected and a scatter plot, but then the legend does not combine the line with the marker.

Any advice welcome, code and results follow.

Thanks,
Raymond

1. Markers for all points

Code:

twoway connected all_d realpce_usd_ppp_grid, /// xscale(log) xlab(1.25 2.5 5 10 20)

Result:

2. Plot only every 10th point

Code:

twoway connected all_d realpce_usd_ppp_grid /// if mod(_n-1,10)==0, /// xscale(log) xlab(1.25 2.5 5 10 20)

Result:

3. Combine twoway with scatter

Code:

twoway (connected all_d realpce_usd_ppp_grid, msymbol(i)) /// || (scatter all_d realpce_usd_ppp_grid if mod(_n-1,10)==0, mcolor(stc1)) /// , xscale(log) xlab(1.25 2.5 5 10 20)

Result:
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

19 Nov 2024, 12:26

Why do you think you need

1. some points to be shown as points (I agree -- for your data showing them all would be excessive)

2. a legend at all? (why not just explain in a text caption what you're doing?)
Comment
Raymond Guiteras

Join Date: Sep 2022

Posts: 12
#3

19 Nov 2024, 12:56

The actual plot has 4 lines and I usually find using both lpattern and msymbol helps me understand what's what.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

20 Nov 2024, 03:04

That context changes the question, as you do need to make it easy to compare all those curves.

Thoughts on various levels:

1. Kernel density estimation is data hungry. Your choice to work on log scale helps. I guess I would always want to see the raw data too (e.g. in a quantile plot). A quantile plot is a good idea any way, but no official command is quite as versatile as qplot from the Stata Journal.

2. You could take one distribution as reference and plot the others as deviations from that. Naturally the kernel density estimates would then need to be for the same points (as well as the same kernel type and width).

3. A front-and-back plot might help. https://www.statalist.org/forums/for...ailable-on-ssc is a long thread but skim and skip.

Code:

search fabplot, sj Search of official help files, FAQs, Examples, and Stata Journals SJ-21-2 gr0087 . . Front-and-back plots to ease spaghetti and paella problems (help fabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox Q2/21 SJ 21(2):539--554 explores front-and-back plots, in which each subset of data is shown separately with the other subsets as backdrop

is the main story in fairly coherent form.
Comment

Raymond Guiteras

Join Date: Sep 2022
Posts: 12

20 Nov 2024, 15:44

In case future Listers have interest, I found an extremely silly solution:

Code:

twoway ///
  (connected target_d realpce_usd_ppp_grid if _n==1)  /// 
  || //////
  (connected target_d realpce_usd_ppp_grid, lcolor(stc1) msymbol(i)) /// 
  || ///
  (scatter target_d realpce_usd_ppp_grid if mod(_n-1,10)==0, mcolor(stc1)) /// 
  , xscale(log) xlab(1.25 2.5 5 10 20) legend(order(1))

Announcement

twoway connected with markers for a subset of points

Comment

Comment

Comment

Comment