Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stripplot - label points

    I have data for 5 observers (x1) making measurements on a VAS scale (x2) using 7 devices and I am using stripplot (from SSC) to visualize the data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id double device byte(x1 x2)
     1 1 1  95
     2 1 2 100
     3 1 3  97
     4 1 4  95
     5 1 5 100
     6 2 1  82
     7 2 2  90
     8 2 3  70
     9 2 4  81
    10 2 5  15
    11 3 1  95
    12 3 2  95
    13 3 3  95
    14 3 4 100
    15 3 5  94
    16 4 1  60
    17 4 2  50
    18 4 3  10
    19 4 4  48
    20 4 5   5
    21 5 1  60
    22 5 2  80
    23 5 3  25
    24 5 4  66
    25 5 5  62
    26 6 1  75
    27 6 2  60
    28 6 3  35
    29 6 4  65
    30 6 5   5
    31 7 1  41
    32 7 2  80
    33 7 3  20
    34 7 4  70
    35 7 5  10
    end
    label values device device
    label def device 1 "N", modify
    label def device 2 "C", modify
    label def device 3 "S", modify
    label def device 4 "i", modify
    label def device 5 "X", modify
    label def device 6 "O", modify
    label def device 7 "G", modify
    
    stripplot x2, over(device) vert cumul cumprob connect(L) box(barw(0.16)) pctile(5) boffset(-0.1)
    Click image for larger version

Name:	x2.png
Views:	1
Size:	48.4 KB
ID:	1755890


    Is there anyway that I can identify each observer on the plot, or would an alternative plot be preferable?

    Julie

  • #2
    Try option mlabel(x1).

    Comment


    • #3
      So simple, thank you.

      I must spend more time reading and understanding the help.

      Comment


      • #4
        This is an interesting little data set.

        Unless the order of device identifiers has meaning, I'd recommend sorting somehow, e.g. on the medians.

        I don't think the box plots add much to the display of data points. Unless readers have a complete understanding of the rules, the box plots might be puzzling. For device S the values are 95 95 95 100 94. Hence the median and quartiles are all 95 and the bars of the box plot collapse to a combined bar of height zero, which is defined in principle but invisible in practice. As a compromise I would shows median as reference levels to guide the eye and brain.

        Here I use -- beyond stripplot from SSC -- myaxis from the Stata Journal.

        Code:
        . search myaxis, sj
        
        Search of official help files, FAQs, Examples, and Stata Journals
        
        SJ-21-3 st0654  . . Speaking Stata: Ordering or ranking groups of observations
                (help myaxis if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/21   SJ 21(3):818--837
                discusses procedures for datasets based on aggregate
                frequencies and for datasets based on individuals and
                introduce a new convenience command, myaxis, that handles
                many cases directly
        Code:
         * Example generated by -dataex-. For more info, type help dataex clear input byte id double device byte(x1 x2)  1 1 1  95  2 1 2 100  3 1 3  97  4 1 4  95  5 1 5 100  6 2 1  82  7 2 2  90  8 2 3  70  9 2 4  81 10 2 5  15 11 3 1  95 12 3 2  95 13 3 3  95 14 3 4 100 15 3 5  94 16 4 1  60 17 4 2  50 18 4 3  10 19 4 4  48 20 4 5   5 21 5 1  60 22 5 2  80 23 5 3  25 24 5 4  66 25 5 5  62 26 6 1  75 27 6 2  60 28 6 3  35 29 6 4  65 30 6 5   5 31 7 1  41 32 7 2  80 33 7 3  20 34 7 4  70 35 7 5  10 end label values device device label def device 1 "N", modify label def device 2 "C", modify label def device 3 "S", modify label def device 4 "i", modify label def device 5 "X", modify label def device 6 "O", modify label def device 7 "G", modify  myaxis device2=device, sort(median x2)  stripplot x2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(, ang(h))
        Click image for larger version

Name:	device.png
Views:	1
Size:	34.4 KB
ID:	1756000

        Comment


        • #5
          Thank you so much for advice. It makes the plot so much more intelligible.
          Julie

          Comment


          • #6
            The scale here is bounded with predictable limiting behaviour: as the mean approaches 100 or 0, so also the variance must approach 0. That suggests seeking a scale on which variability is approximately constant. Logit is the natural scale if neither limit is attained, but that isn't applicable here given values of 100. A near neighbour is the folded root transformation, which connoisseurs will note is similar in strength to a once much used but now unfashionable transformation, the angular or arc sine square root. For such an unusual transformation showing axis labels on the original scale is all but essential. For that I use mylabels from the Stata Journal

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte id double device byte(x1 x2)
             1 1 1  95
             2 1 2 100
             3 1 3  97
             4 1 4  95
             5 1 5 100
             6 2 1  82
             7 2 2  90
             8 2 3  70
             9 2 4  81
            10 2 5  15
            11 3 1  95
            12 3 2  95
            13 3 3  95
            14 3 4 100
            15 3 5  94
            16 4 1  60
            17 4 2  50
            18 4 3  10
            19 4 4  48
            20 4 5   5
            21 5 1  60
            22 5 2  80
            23 5 3  25
            24 5 4  66
            25 5 5  62
            26 6 1  75
            27 6 2  60
            28 6 3  35
            29 6 4  65
            30 6 5   5
            31 7 1  41
            32 7 2  80
            33 7 3  20
            34 7 4  70
            35 7 5  10
            end
            label values device device
            label def device 1 "N", modify
            label def device 2 "C", modify
            label def device 3 "S", modify
            label def device 4 "i", modify
            label def device 5 "X", modify
            label def device 6 "O", modify
            label def device 7 "G", modify
            
            myaxis device2=device, sort(median x2)
            
            stripplot x2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(, ang(h))
            
            gen frootx2 = sqrt(x2) - sqrt(100 - x2)
            
            mylabels 0(10)100, myscale(sqrt(@) - sqrt(100-@)) local(yla)
            
            stripplot frootx2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(`yla', ang(h)) ytitle(x2 (folded root scale))
            Whether the simplification of behaviour justifies an unusual scale is hard to judge.


            The standard reference here for folded root is John Tukey's Exploratory Data Analysis 1977. Good examples can be found in Andrew Siegel's Statistics and Data Analysis (first edition 1996 only) and Mary Breckenbridge's monograph Age, Time and Fertility 1983. Further references welcome (aside from earlier examples in Tukey's work, which go back much earlier). See also the collective volume https://onlinelibrary.wiley.com/doi/.../9780470316832
            Attached Files
            Last edited by Nick Cox; 12 Jun 2024, 09:04.

            Comment


            • #7
              Thank you once again. I had not thought of transforming the scale but it certainly shows more detail at the upeer and lower limits. I think that it will be useful for the other datasets in this study.

              Comment


              • #8
                A friend pointed out gently that I had overlooked the detail in #1

                Is there anyway that I can identify each observer on the plot, or would an alternative plot be preferable?
                Here is a different take using fabplot from the Stata Journal. The idea of front-and-back plots (my term, but an older idea) is that each group in turn is shown in front and the other groups are shown as background.

                SJ-21-2 gr0087 . . Front-and-back plots to ease spaghetti and paella problems
                (help fabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                Q2/21 SJ 21(2):539--554
                explores front-and-back plots, in which each subset of data
                is shown separately with the other subsets as backdrop



                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte id double device byte(x1 x2)
                 1 1 1  95
                 2 1 2 100
                 3 1 3  97
                 4 1 4  95
                 5 1 5 100
                 6 2 1  82
                 7 2 2  90
                 8 2 3  70
                 9 2 4  81
                10 2 5  15
                11 3 1  95
                12 3 2  95
                13 3 3  95
                14 3 4 100
                15 3 5  94
                16 4 1  60
                17 4 2  50
                18 4 3  10
                19 4 4  48
                20 4 5   5
                21 5 1  60
                22 5 2  80
                23 5 3  25
                24 5 4  66
                25 5 5  62
                26 6 1  75
                27 6 2  60
                28 6 3  35
                29 6 4  65
                30 6 5   5
                31 7 1  41
                32 7 2  80
                33 7 3  20
                34 7 4  70
                35 7 5  10
                end
                label values device device
                label def device 1 "N", modify
                label def device 2 "C", modify
                label def device 3 "S", modify
                label def device 4 "i", modify
                label def device 5 "X", modify
                label def device 6 "O", modify
                label def device 7 "G", modify
                
                myaxis device2=device, sort(median x2)
                
                stripplot x2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(, ang(h)) name(G1, replace)
                
                gen frootx2 = sqrt(x2) - sqrt(100 - x2)
                
                label var frootx2 "x2 (folded root scale)"
                
                mylabels 0(10)100, myscale(sqrt(@) - sqrt(100-@)) local(yla)
                
                stripplot frootx2, cumul refline(lc(magenta)) reflevel(median) centre over(device2) vertical c(L) yla(`yla', ang(h)) ytitle(x2 (folded root scale)) name(G2, replace)
                
                mylabels 0 10 25 50 75 90 100, myscale(sqrt(@) - sqrt(100-@)) local(yla2) 
                
                fabplot connected frootx2 device2, by(x1) yla(`yla2') xla(1/7, valuelabel) frontopts(lw(thick)) name(G3, replace)
                Click image for larger version

Name:	device3.png
Views:	1
Size:	177.3 KB
ID:	1756065

                Comment


                • #9
                  Thank you again for your invaluable advice. I am only now beginning to appreciate the power of using the appropriate graphics - somewhat different to the bar chart with +/- standard deviation that I was orginally taught as 'the plot to use'!
                  Julie

                  Comment


                  • #10
                    Effect bars with error bars are often known (negatively) as dynamite plots, detonator plots, or plunger plots -- and perhaps other names too.

                    For propaganda against their use, see e.g.

                    https://biostat.app.vumc.org/wiki/pu...de/Poster3.pdf

                    https://simplystatistics.org/posts/2...lots-must-die/

                    ttps://warwick.ac.uk/fac/sci/wdsi/events/wrug/resources/plunger.pdf

                    Comment

                    Working...
                    X