Stripplot - label points

Julie Xiu

Join Date: May 2023
Posts: 14

Stripplot - label points

11 Jun 2024, 10:16

I have data for 5 observers (x1) making measurements on a VAS scale (x2) using 7 devices and I am using stripplot (from SSC) to visualize the data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id double device byte(x1 x2)
 1 1 1  95
 2 1 2 100
 3 1 3  97
 4 1 4  95
 5 1 5 100
 6 2 1  82
 7 2 2  90
 8 2 3  70
 9 2 4  81
10 2 5  15
11 3 1  95
12 3 2  95
13 3 3  95
14 3 4 100
15 3 5  94
16 4 1  60
17 4 2  50
18 4 3  10
19 4 4  48
20 4 5   5
21 5 1  60
22 5 2  80
23 5 3  25
24 5 4  66
25 5 5  62
26 6 1  75
27 6 2  60
28 6 3  35
29 6 4  65
30 6 5   5
31 7 1  41
32 7 2  80
33 7 3  20
34 7 4  70
35 7 5  10
end
label values device device
label def device 1 "N", modify
label def device 2 "C", modify
label def device 3 "S", modify
label def device 4 "i", modify
label def device 5 "X", modify
label def device 6 "O", modify
label def device 7 "G", modify

stripplot x2, over(device) vert cumul cumprob connect(L) box(barw(0.16)) pctile(5) boffset(-0.1)

Click image for larger version

Name: x2.png
Views: 1
Size: 48.4 KB
ID: 1755890

Is there anyway that I can identify each observer on the plot, or would an alternative plot be preferable?

Julie

Tags: None

Nils Enevoldsen

Join Date: Oct 2014

Posts: 283
#2

11 Jun 2024, 13:45

Try option mlabel(x1).
Comment
Julie Xiu

Join Date: May 2023

Posts: 14
#3

11 Jun 2024, 14:16

So simple, thank you.

I must spend more time reading and understanding the help.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

12 Jun 2024, 07:41

This is an interesting little data set.

Unless the order of device identifiers has meaning, I'd recommend sorting somehow, e.g. on the medians.

I don't think the box plots add much to the display of data points. Unless readers have a complete understanding of the rules, the box plots might be puzzling. For device S the values are 95 95 95 100 94. Hence the median and quartiles are all 95 and the bars of the box plot collapse to a combined bar of height zero, which is defined in principle but invisible in practice. As a compromise I would shows median as reference levels to guide the eye and brain.

Here I use -- beyond stripplot from SSC -- myaxis from the Stata Journal.

Code:

. search myaxis, sj

Search of official help files, FAQs, Examples, and Stata Journals

SJ-21-3 st0654  . . Speaking Stata: Ordering or ranking groups of observations
        (help myaxis if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
        Q3/21   SJ 21(3):818--837
        discusses procedures for datasets based on aggregate
        frequencies and for datasets based on individuals and
        introduce a new convenience command, myaxis, that handles
        many cases directly

Code:

 * Example generated by -dataex-. For more info, type help dataex clear input byte id double device byte(x1 x2)  1 1 1  95  2 1 2 100  3 1 3  97  4 1 4  95  5 1 5 100  6 2 1  82  7 2 2  90  8 2 3  70  9 2 4  81 10 2 5  15 11 3 1  95 12 3 2  95 13 3 3  95 14 3 4 100 15 3 5  94 16 4 1  60 17 4 2  50 18 4 3  10 19 4 4  48 20 4 5   5 21 5 1  60 22 5 2  80 23 5 3  25 24 5 4  66 25 5 5  62 26 6 1  75 27 6 2  60 28 6 3  35 29 6 4  65 30 6 5   5 31 7 1  41 32 7 2  80 33 7 3  20 34 7 4  70 35 7 5  10 end label values device device label def device 1 "N", modify label def device 2 "C", modify label def device 3 "S", modify label def device 4 "i", modify label def device 5 "X", modify label def device 6 "O", modify label def device 7 "G", modify  myaxis device2=device, sort(median x2)  stripplot x2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(, ang(h))

Click image for larger version

Name: device.png
Views: 1
Size: 34.4 KB
ID: 1756000

Comment

Julie Xiu

Join Date: May 2023

Posts: 14
#5

12 Jun 2024, 08:02

Thank you so much for advice. It makes the plot so much more intelligible.
Julie
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#6

12 Jun 2024, 09:00

The scale here is bounded with predictable limiting behaviour: as the mean approaches 100 or 0, so also the variance must approach 0. That suggests seeking a scale on which variability is approximately constant. Logit is the natural scale if neither limit is attained, but that isn't applicable here given values of 100. A near neighbour is the folded root transformation, which connoisseurs will note is similar in strength to a once much used but now unfashionable transformation, the angular or arc sine square root. For such an unusual transformation showing axis labels on the original scale is all but essential. For that I use mylabels from the Stata Journal

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte id double device byte(x1 x2) 1 1 1 95 2 1 2 100 3 1 3 97 4 1 4 95 5 1 5 100 6 2 1 82 7 2 2 90 8 2 3 70 9 2 4 81 10 2 5 15 11 3 1 95 12 3 2 95 13 3 3 95 14 3 4 100 15 3 5 94 16 4 1 60 17 4 2 50 18 4 3 10 19 4 4 48 20 4 5 5 21 5 1 60 22 5 2 80 23 5 3 25 24 5 4 66 25 5 5 62 26 6 1 75 27 6 2 60 28 6 3 35 29 6 4 65 30 6 5 5 31 7 1 41 32 7 2 80 33 7 3 20 34 7 4 70 35 7 5 10 end label values device device label def device 1 "N", modify label def device 2 "C", modify label def device 3 "S", modify label def device 4 "i", modify label def device 5 "X", modify label def device 6 "O", modify label def device 7 "G", modify myaxis device2=device, sort(median x2) stripplot x2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(, ang(h)) gen frootx2 = sqrt(x2) - sqrt(100 - x2) mylabels 0(10)100, myscale(sqrt(@) - sqrt(100-@)) local(yla) stripplot frootx2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(`yla', ang(h)) ytitle(x2 (folded root scale))

Whether the simplification of behaviour justifies an unusual scale is hard to judge.

The standard reference here for folded root is John Tukey's Exploratory Data Analysis 1977. Good examples can be found in Andrew Siegel's Statistics and Data Analysis (first edition 1996 only) and Mary Breckenbridge's monograph Age, Time and Fertility 1983. Further references welcome (aside from earlier examples in Tukey's work, which go back much earlier). See also the collective volume https://onlinelibrary.wiley.com/doi/.../9780470316832
Attached Files

Last edited by Nick Cox; 12 Jun 2024, 09:04.
Comment
Julie Xiu

Join Date: May 2023

Posts: 14
#7

12 Jun 2024, 09:32

Thank you once again. I had not thought of transforming the scale but it certainly shows more detail at the upeer and lower limits. I think that it will be useful for the other datasets in this study.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

12 Jun 2024, 16:29

A friend pointed out gently that I had overlooked the detail in #1

Is there anyway that I can identify each observer on the plot, or would an alternative plot be preferable?

Here is a different take using fabplot from the Stata Journal. The idea of front-and-back plots (my term, but an older idea) is that each group in turn is shown in front and the other groups are shown as background.

SJ-21-2 gr0087 . . Front-and-back plots to ease spaghetti and paella problems
(help fabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q2/21 SJ 21(2):539--554
explores front-and-back plots, in which each subset of data
is shown separately with the other subsets as backdrop

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id double device byte(x1 x2)
 1 1 1  95
 2 1 2 100
 3 1 3  97
 4 1 4  95
 5 1 5 100
 6 2 1  82
 7 2 2  90
 8 2 3  70
 9 2 4  81
10 2 5  15
11 3 1  95
12 3 2  95
13 3 3  95
14 3 4 100
15 3 5  94
16 4 1  60
17 4 2  50
18 4 3  10
19 4 4  48
20 4 5   5
21 5 1  60
22 5 2  80
23 5 3  25
24 5 4  66
25 5 5  62
26 6 1  75
27 6 2  60
28 6 3  35
29 6 4  65
30 6 5   5
31 7 1  41
32 7 2  80
33 7 3  20
34 7 4  70
35 7 5  10
end
label values device device
label def device 1 "N", modify
label def device 2 "C", modify
label def device 3 "S", modify
label def device 4 "i", modify
label def device 5 "X", modify
label def device 6 "O", modify
label def device 7 "G", modify

myaxis device2=device, sort(median x2)

stripplot x2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(, ang(h)) name(G1, replace)

gen frootx2 = sqrt(x2) - sqrt(100 - x2)

label var frootx2 "x2 (folded root scale)"

mylabels 0(10)100, myscale(sqrt(@) - sqrt(100-@)) local(yla)

stripplot frootx2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(`yla', ang(h)) ytitle(x2 (folded root scale)) name(G2, replace)

mylabels 0 10 25 50 75 90 100, myscale(sqrt(@) - sqrt(100-@)) local(yla2) 

fabplot connected frootx2 device2, by(x1) yla(`yla2') xla(1/7, valuelabel) frontopts(lw(thick)) name(G3, replace)

Click image for larger version

Name: device3.png
Views: 1
Size: 177.3 KB
ID: 1756065

Comment

Julie Xiu

Join Date: May 2023

Posts: 14
#9

13 Jun 2024, 04:57

Thank you again for your invaluable advice. I am only now beginning to appreciate the power of using the appropriate graphics - somewhat different to the bar chart with +/- standard deviation that I was orginally taught as 'the plot to use'!
Julie
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#10

13 Jun 2024, 15:50

Effect bars with error bars are often known (negatively) as dynamite plots, detonator plots, or plunger plots -- and perhaps other names too.

For propaganda against their use, see e.g.

https://biostat.app.vumc.org/wiki/pu...de/Poster3.pdf

https://simplystatistics.org/posts/2...lots-must-die/

ttps://warwick.ac.uk/fac/sci/wdsi/events/wrug/resources/plunger.pdf
Comment

Announcement

Stripplot - label points

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment