Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with pcspike to display a paired plot with names

    I am using Nick Cox's example code to produce a 'paired plot with names' as in Dr. Cox's "Speaking Stata Graphics" pp 267-271 (sjs (2009) 9, No 4, pp 621-639) example.
    I have 2 continuous measures, retain4 and gcsi, sorted on rank of retain4. My "name" column identifier is 'idstring'. The only modification I made to the example code besides variable names was to reduce 'left" to 0.2.
    My questions are:
    • I would like to reduce the size of the 'gcsi' and 'retain4' values displayed in the 2 vertical columns on the figure.
    • I would like to change the colors of each variable's values displayed, and the lines connecting the 2 variables.
    • I am not getting the expected column for the 'Name'. variable. Instead my 'Name' column displays far left of the retain4 (1st variable ranked) and appears to be
    I have tried the size and color two-way graph options but am but not successfully.
    I am unsure what the issue is with the Name column - is displays very far left of the 1st variable column, and appears to be printing vertically.

    Here is my code:
    #delimit;
    twoway pcspike rank1 one rank2 two,
    xla(none) xsc(noline r(0 2.3)) xtitle("") ysc(r(-1 .) reverse off) yla( , nogrid) ||
    scatter rank1 one, mla (retain4_rd) mlabpos(9) ms(none) ||
    scatter rank2 two, mla(gcsi_rd) mlabpos(3) ms(none) ||
    scatter rank1 left, mla(idstring) mlabpos(3) ms(none) text(-0.5 1 "Bl % retention", size(vsmall))
    text(-0.5 2 "GCSI severity", size(vsmall))
    legend(off) graphregion(color(white))
    ;

    Thank you for your help.

  • #2
    Willing to help here but despite writing similar code a while back I am no better than anyone else at reading code like this and a verbal description and visualizing what the problem is. Please post a data example at least and ideally the graph you are getting as well so that the question becomes clear.

    Comment


    • #3
      Thank you. I will upload data along with example figure this evening.

      Comment


      • #4
        So OK, we're in very different time zones, and I may not get back to you for some time, but someone else is more than welcome to get there first.

        Comment


        • #5
          This is my first time posting so I may not have interpreted how to upload my data correctly.
          My code is copied output from running dataex. Should I instead have put this is an attachment or other file type?

          Data Dictionary:
          retain4= continuous % or stomach food retention (0-100)
          gcsi = ordinal symptom severity score from 0 to 5, treated continuously for this summary measure.
          retain4_rd and gcsi_rd are rounded to the nearest 0.1.
          rank1 gives the order of file sorted on retain4
          rank2 is the corresponding rank of the gcsi
          one, two are byte variables = 1 and 2, respectively, as in Nick Cox's example
          left=0.4 as in Nick Cox's example.
          id and idstring (str3) are the "Name" of each measure to be linked as in rank1 (in Nick Cox example, this column is Name)

          I would like my graph to look similar to the example graph (Figure 7) in the sj9(2009)No4,pp621-639 except with the stringid column left-most, my retain4 column next, and the gcsi column last.

          My attempt created red font that appeared as vertical and overlapping id's, retain4 printed as tiny overlapping orange values, gcsi printed as blue larger overlapping values, and the connecting lines were green (this was using graph scheme s1color). I just set the graph scheme to sj, and I get similar figure except now all in black and shades of dark gray.

          Thanks again for any help.




          Click image for larger version

Name:	Figure7Example.PNG
Views:	1
Size:	27.8 KB
ID:	1578169
          My datafile:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double(retain4 gcsi) float(gcsi_rd retain4_rd left rank1 rank2) byte(one two) float idnum str3 idstring
          0 2.5555555555555554 2.6 0 .4 1 39 1 2 1 "1"
          0 3.6388888888888893 3.6 0 .4 2 96 1 2 2 "2"
          .4 3.805555555555556 3.8 0 .4 3 100.5 1 2 3 "3"
          1 3.2222222222222228 3.2 1 .4 4 75 1 2 4 "4"
          1.1 3.111111111111111 3.1 1 .4 5 70 1 2 5 "5"
          1.3 2.972222222222222 3 1 .4 6 62 1 2 6 "6"
          1.4 3.805555555555556 3.8 1 .4 7 100.5 1 2 7 "7"
          2 3.5 3.5 2 .4 8 89 1 2 8 "8"
          2.3 4.333333333333333 4.3 2 .4 9 123 1 2 9 "9"
          2.5 4.222222222222222 4.2 3 .4 10 119.5 1 2 10 "10"
          2.5 3 3 3 .4 11 64.5 1 2 11 "11"
          3 3.3888888888888893 3.4 3 .4 12 84 1 2 12 "12"
          3.6 2.6666666666666665 2.7 4 .4 13 46.5 1 2 13 "13"
          4 4.222222222222222 4.2 4 .4 14 119.5 1 2 14 "14"
          4.5 4.055555555555555 4.1 5 .4 15 112.5 1 2 15 "15"
          5 1.5 1.5 5 .4 16 12.5 1 2 16 "16"
          5 3.305555555555556 3.3 5 .4 17 77 1 2 17 "17"
          5.1 2.638888888888889 2.6 5 .4 18 45 1 2 18 "18"
          6 2.3333333333333335 2.3 6 .4 19 36.5 1 2 19 "19"
          6.7 .6944444444444445 .7 7 .4 20 5 1 2 20 "20"
          7 2.1666666666666665 2.2 7 .4 21 24.5 1 2 21 "21"
          7.2 3.9166666666666665 3.9 7 .4 22 103.5 1 2 22 "22"
          7.9 2.2222222222222223 2.2 8 .4 23 30.5 1 2 23 "23"
          8 3.638888888888889 3.6 8 .4 24 95 1 2 24 "24"
          8 2.2777777777777777 2.3 8 .4 25 33.5 1 2 25 "25"
          8 3.9166666666666665 3.9 8 .4 26 103.5 1 2 26 "26"
          8 4.333333333333333 4.3 8 .4 27 123 1 2 27 "27"
          9 2.2777777777777777 2.3 9 .4 28 33.5 1 2 28 "28"
          9 3.3333333333333335 3.3 9 .4 29 80 1 2 29 "29"
          9 3.7777777777777772 3.8 9 .4 30 98 1 2 30 "30"
          9 4.777777777777778 4.8 9 .4 31 131.5 1 2 31 "31"
          10 3.1111111111111107 3.1 10 .4 32 69 1 2 32 "32"
          10 3.3333333333333335 3.3 10 .4 33 80 1 2 33 "33"
          10 3.9166666666666665 3.9 10 .4 34 103.5 1 2 34 "34"
          11 2.777777777777778 2.8 11 .4 35 55 1 2 35 "35"
          11 .9166666666666666 .9 11 .4 36 6.5 1 2 36 "36"
          11.5 4.555555555555555 4.6 12 .4 37 128.5 1 2 37 "37"
          11.9 3.972222222222222 4 12 .4 38 106.5 1 2 38 "38"
          12 4.555555555555555 4.6 12 .4 39 128.5 1 2 39 "39"
          12 2.7777777777777772 2.8 12 .4 40 53.5 1 2 40 "40"
          12.1 3.444444444444444 3.4 12 .4 41 86.5 1 2 41 "41"
          13 3.1388888888888893 3.1 13 .4 42 73 1 2 42 "42"
          13 4 4 13 .4 43 108.5 1 2 43 "43"
          13 .25 .3 13 .4 44 3 1 2 44 "44"
          13 2.972222222222222 3 13 .4 45 62 1 2 45 "45"
          13.3 4.027777777777778 4 13 .4 46 110.5 1 2 46 "46"
          13.4 3.972222222222222 4 13 .4 47 106.5 1 2 47 "47"
          13.6 2 2 14 .4 48 17.5 1 2 48 "48"
          13.8 2.694444444444444 2.7 14 .4 49 48 1 2 49 "49"
          14 3.1388888888888893 3.1 14 .4 50 73 1 2 50 "50"
          14.2 2.1666666666666665 2.2 14 .4 51 24.5 1 2 51 "51"
          14.7 4.916666666666667 4.9 15 .4 52 133 1 2 52 "52"
          15 2.611111111111111 2.6 15 .4 53 43.5 1 2 53 "53"
          15 3.527777777777778 3.5 15 .4 54 91 1 2 54 "54"
          16 3.1388888888888893 3.1 16 .4 55 73 1 2 55 "55"
          16 2.3333333333333335 2.3 16 .4 56 36.5 1 2 56 "56"
          16.5 2.555555555555556 2.6 17 .4 57 41 1 2 57 "57"
          16.9 2 2 17 .4 58 17.5 1 2 58 "58"
          17 2.194444444444444 2.2 17 .4 59 27 1 2 59 "59"
          17.7 3.555555555555556 3.6 18 .4 60 93 1 2 60 "60"
          18 2.75 2.8 18 .4 61 50.5 1 2 61 "61"
          18 3.388888888888889 3.4 18 .4 62 83 1 2 62 "62"
          18 4.361111111111111 4.4 18 .4 63 125 1 2 63 "63"
          18 3.138888888888889 3.1 18 .4 64 71 1 2 64 "64"
          19 4.277777777777778 4.3 19 .4 65 121 1 2 65 "65"
          19.4 2.1666666666666665 2.2 19 .4 66 24.5 1 2 66 "66"
          20 4.527777777777778 4.5 20 .4 67 127 1 2 67 "67"
          20 2.861111111111111 2.9 20 .4 68 59.5 1 2 68 "68"
          20 4.333333333333333 4.3 20 .4 69 123 1 2 69 "69"
          20.7 .9166666666666666 .9 21 .4 70 6.5 1 2 70 "70"
          20.8 1 1 21 .4 71 8.5 1 2 71 "71"
          21 2.8055555555555554 2.8 21 .4 72 56.5 1 2 72 "72"
          21 4.111111111111111 4.1 21 .4 73 116 1 2 73 "73"
          21.8 2.611111111111111 2.6 22 .4 74 43.5 1 2 74 "74"
          22 2.111111111111111 2.1 22 .4 75 21.5 1 2 75 "75"
          24 4.027777777777778 4 24 .4 76 110.5 1 2 76 "76"
          24 2.8611111111111107 2.9 24 .4 77 58 1 2 77 "77"
          24 1.3333333333333333 1.3 24 .4 78 11 1 2 78 "78"
          25 3.611111111111111 3.6 25 .4 79 94 1 2 79 "79"
          25 2.25 2.3 25 .4 80 32 1 2 80 "80"
          26 2.555555555555556 2.6 26 .4 81 41 1 2 81 "81"
          26 3 3 26 .4 82 64.5 1 2 82 "82"
          27 2.111111111111111 2.1 27 .4 83 21.5 1 2 83 "83"
          27 4.777777777777778 4.8 27 .4 84 131.5 1 2 84 "84"
          28 2.2222222222222223 2.2 28 .4 85 30.5 1 2 85 "85"
          29 3.0277777777777772 3 29 .4 86 66 1 2 86 "86"
          29 2.222222222222222 2.2 29 .4 87 28.5 1 2 87 "87"
          29.1 2.4444444444444446 2.4 29 .4 88 38 1 2 88 "88"
          29.4 .19444444444444442 .2 29 .4 89 2 1 2 89 "89"
          30 3.9166666666666665 3.9 30 .4 90 103.5 1 2 90 "90"
          30 1 1 30 .4 91 8.5 1 2 91 "91"
          32 3.777777777777778 3.8 32 .4 92 99 1 2 92 "92"
          33 3.5277777777777772 3.5 33 .4 93 90 1 2 93 "93"
          34 4.166666666666667 4.2 34 .4 94 118 1 2 94 "94"
          35 3.0555555555555554 3.1 35 .4 95 67 1 2 95 "95"
          35 1.25 1.3 35 .4 96 10 1 2 96 "96"
          35 3.5555555555555554 3.6 35 .4 97 92 1 2 97 "97"
          35.9 1.6666666666666667 1.7 36 .4 98 14.5 1 2 98 "98"
          36 3.4166666666666665 3.4 36 .4 99 85 1 2 99 "99"
          40 4.138888888888888 4.1 40 .4 100 117 1 2 100 "100"
          end

          Comment


          • #6
            Thanks for the data example.

            I see that the code in #1 produces a very disappointing graph with the data in #5 -- made less puzzling once it is realised that at least 33 patients have been left off because you used the default sample size of 100 in dataex. That's OK; there are enough to see what is going on.

            One problem is self-inflicted:


            I am unsure what the issue is with the Name column - is displays very far left of the 1st variable column, and appears to be printing vertically.
            That is because you are insisting on putting it at a horizontal position of 0.4 when the spikes extend horizontally from 1 to 2. A choice of 0.85 or so is closer to what you want.

            However, fixing that can't solve more fundamental problems, which makes this design a disappointing choice for your data.

            1. ​​​​Much of the point about the example in my paper in 2009 was that there are identifiers (river names) that (should) mean something to the reader. Here your identifiers just run 1 to 133 or even more and it seems unlikely that they have so much meaning unless you are addressing clinicians and they can look up information on the patients.

            2. With 133 or more items rather than say 25 as in my example the pressure on space is enormously greater even if you mess about with the aspect ratio or the presentation size.

            3. The relationship between the two variables appears weak or non-existent. A design like this works well if (and, pretty much, only if) there is overall a moderate positive relationship between the two variables being ranked with possibly some striking exceptions that could be of real interest. A moderate negative relationship can be manageable too, but you need to reverse the ranks on one variable;

            I suggest that a common-or-garden scatter plot loses almost nothing likely to be of statistical or even clinical interest.


            Code:
            scatter retain4 gcsi, ms(none) mla(idnum) mlabpos(0) yla(, ang(h))



            Click image for larger version

Name:	retain4_gcsi.png
Views:	1
Size:	28.1 KB
ID:	1578227


            I have no idea really what to think about this data -- my Googling got as far as "gastric" and that was enough -- but now there is enormously more scope to wonder about patient 89 or 100 or whoever.

            Comment


            • #7
              I appreciate you taking the time to help me. Part of my understanding of the code is obviously incorrect, so the information about placing the "Name" column in itself helps me to better understand how to use the code.

              You are correct about the identifier - the data represent paired patient data from an active network; therefore, I had to de-identify the patient and only send a sub-set of the data. I am analyzing data to understand how these 2 measures are correlated at different time points (this is baseline data), and wanted to show how gcsi was related to the stomach retention. I then would have a panel showing the follow-up measures for both variables to see if the relationship changed. I began with scatter plots with a "fitted" and "smoothed" regression line included, but the graphs are fairly busy, and the investigator would prefer a different type of graph to show how certain patients have a relationship with the 2 measures over time and others do not.
              I have created dot plots of the baseline, followup, and change for each measure -- so that may be a better way to display these data for this investigator.
              W
              I would like to learn to create this graph since we do work with data of this sort. Were you able to determine why I was getting colored font for the text headers? Was that just due to using the s1color graph scheme?
              Is it your opinion that it was due to the large number of unique values for each variable that the values for each of the variables (gcsi, retain4) were "large" for the amount of graph space?

              Thank you again for your help. I will try to run the data you provide in your book to see if I can produce Figure 7 successfully.

              Comment


              • #8
                I couldn't find the dataset in the Stata Journal files but I did find it on my own system.

                Here is a complete script for the graph in the 2009 paper. I updated the display of units of measurement.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str19 name byte id float area byte depo int(length basinlength) float(suspended discharge)
                "Amazon"               1   6150  4 6299 3310 1150 200000
                "Amudar'ya"            2    309 10 2620 1380   94   1450
                "Amur"                 3   1855 12 4416 2455   52  10300
                "Apalachicola"         4   51.8  .  880  521  .17    641
                "Brahmaputra"          5    610 21 2840 1270  520  19300
                "Brazos"               6    114  . 1400 1020   31    222
                "Burdekin"             7    131  7  680  520    3    476
                "Chao Phraya"          8    160 22 1200  700   11    824
                "Chari"                9    880  7 1400  920    4   1320
                "Colorado, CA"        10    640  . 2333 1300  150     32
                "Colorado, TX"        11    100  . 1450  790   13    634
                "Columbia"            12    670  . 1950 1200   15   7930
                "Colville"            13   60.9  .  662  520    6    492
                "Copper"              14   61.8  .  360  325   70   1240
                "Danube"              15    815 16 2860 1250   70   6660
                "Delaware"            16   22.9  .  518  350  .68    329
                "Dnepr"               17    504 11 2200 1045  2.1   1650
                "Dnestr"              18   72.1 20 1350  695  2.5    379
                "Don"                 19    422 11 1870  770    6    856
                "Ebro"                20   86.8  .  930  470   21    492
                "Elbe"                21    148  9 1110  705  .84    690
                "Fly"                 22   64.4 24  744  475   70   4760
                "Fraser"              23    220  . 1110  736   20   3550
                "Ganges"              24    980 28 2510 1560  524  11600
                "Garonne"             25     86  6  650  330  2.2    600
                "Godavari"            26    287  1 1500  920  170   2920
                "Haiho"               27   50.8  .  650  460   81     63
                "Indigirka"           28    360  2 1726 1120   14   1740
                "Indus"               29    960 10 3180 1610  250   7610
                "Irrawaddy"           30    410 12 2300 1420  260  13600
                "Jana"                31    238  .  872  815    3    920
                "Kemijoki"            32   37.8  3  600  320  .15    534
                "Kizil Irmak"         33   75.8  8 1151  375   23    192
                "Kolyma"              34    647  3 3513 1150    6   2250
                "Krishna"             35    256  0 1290  860   65   1607
                "Kura"                36    188  2 1360  650   36    515
                "Kuskokwim"           37    116  . 1080  700  7.5      .
                "Lena"                38   2430  4 4400 2525   12  16200
                "Liao He"             39    170  . 1350  515   41    190
                "Limpopo"             40    440  . 1600  840   33    160
                "Loire"               41    120  5 1110  540  1.5      .
                "Mackenzie"           42   1448  . 4240 2270  125   9830
                "Magdalena"           43    260  . 1530 1050  220   6980
                "Mahakam"             44     75 20    .  420   12   2000
                "Mahanadi"            45    133  1  858  630   60   1970
                "Mekong"              46    810 22 4500 2950  160  14900
                "Meuse"               47     29  .  925  440   .7    331
                "Mississippi"         48   3344  . 5985 2220  400  18400
                "Mobile"              49     57  . 1064  580  2.3   1590
                "Murray"              50    910 24 3490 1000   30    698
                "Niger"               51 1112.7  7 4160 1950   32   6020
                "Nile"                52   2715  9 6670 3600  125    317
                "Ob"                  53   2500  7 5570 2530   16  12200
                "Oder"                54    112 35  909  515  .13    539
                "Orange"              55    102  0 1860 1285   91   2890
                "Ord"                 56     46  1    .  400   22    165
                "Orinoco"             57    945  . 2740 1550  150  34900
                "Parana"              58   2600  . 4500 2175  112  18000
                "Pechora"             59    322  5 1810  760  6.1   3360
                "Po"                  60     75 19  691  480   18   1490
                "Red (Song Koi)"      61    120  9 1200  860  123   3810
                "Rhein"               62    225  2 1360  725  .72   2243
                "Rhone"               63     99  6  810  540   60   1550
                "Rio Colorado (Arg)"  64     65  . 1000 1460  6.9      .
                "Rio Grande"          65    670  . 2870 1725   30     95
                "Rio Grande Santiago" 66    125 19  960  650    1    308
                "Rio Negro (Arg)"     67    130  .  729  880   13    951
                "Rufiji"              68    178  . 1400  625   17    285
                "Sacramento"          69     73  .  610  385    3    678
                "Salween"             70    325  5 3060 1725  100   9510
                "San Joaquin"         71   80.1  .  560  450    1    123
                "Sanaga"              72    135  0  860  660  5.9   2069
                "Sao Francisco"       73    640  2 2800 1510    6   3080
                "Seine"               74   78.6  6  780  370  1.1    685
                "Senegal"             75    441  2 1430  900  1.9    761
                "Sepik"               76     81 30  825  425   80   2440
                "Severnaya Dvina"     77    350  .    .  850  4.5   3360
                "Shatt al Arab"       78   1050 19 2760 1475  103   1460
                "St Lawrence"         79   1185  . 3060 1650    4  14300
                "Susitna"             80   50.3  .  454  370   25   1270
                "Susquehanna"         81   72.5  .  733  445  1.8   1034
                "Syrdar'ya"           82    219 12 2210 1440   12    581
                "Tana"                83     91  .  720  470   32    171
                "Terek"               84   43.2 17  623  390   24      .
                "Ural"                85    237 10 2430 1020    3    301
                "Uruguay"             86    240  .    . 1085   11   5010
                "Vistula"             87    198 33 1014  600  2.5   1044
                "Volga"               88   1350  7 3350 1640   26   8400
                "Volta"               89    394  5 1600  980   19   1270
                "Wester"              90     46 12  724  375  .33    313
                "Xi Jiang"            91    464  5 2129 1150   80   9510
                "Yangtze"             92   1940  . 5520 2730  480  28500
                "Yellow (Huang He)"   93    980  . 4670 2070  120   1550
                "Yenisei"             94   2580  9 5550 2250   13  17800
                "Yukon"               95    855  . 3000 2140   60   6180
                "Zaire (Congo)"       96   3700  . 4370 2020 32.8  40900
                "Zambezi"             97   1400  8 2660 2040   48   6980
                end
                
                gsort -area
                keep in 1/25
                replace area = round(area)
                gen rank1 = _n
                egen rank2 = rank(-discharge)
                
                gen byte one = 1
                gen byte two = 2
                gen left = 0.4
                
                set scheme s1color
                 
                twoway pcspike rank1 one rank2 two, xla(none) xsc(noline r(0.3 2.3)) xtitle("") ///
                 ysc(r(-1 .) reverse off) yla(, nogrid) ///
                 || scatter rank1 one, mla(area) mlabpos(9) ms(none) ///
                 || scatter rank2 two, mla(discharge) mlabpos(3) ms(none) ///
                 || scatter rank1 left, mla(name) mlabpos(3) ms(none) text(-0.5 1 "area, 10{sup:3} km{sup:2}") text(-0.5 2 "discharge, m{sup:3}s{sup:-1}") ///
                 legend(off) graphregion(color(white))
                Click image for larger version

Name:	rivers..png
Views:	1
Size:	87.1 KB
ID:	1578312



                I will update later with whatever I can say more about your graph.

                UPDATE

                Were you able to determine why I was getting colored font for the text headers? Was that just due to using the s1color graph scheme?

                Sure: marker labels and text will get colours implied by the graph scheme unless you reach in and specify otherwise.

                Is it your opinion that it was due to the large number of unique values for each variable that the values for each of the variables (gcsi, retain4) were "large" for the amount of graph space?

                Yes. You had 100 items in your data example and that alone was a severe challenge.
                Last edited by Nick Cox; 21 Oct 2020, 12:21.

                Comment


                • #9
                  Thank you so very much. This looks very nice. I really appreciate all your help (and also really find your book and sj articles so informative).

                  Comment


                  • #10
                    Thanks in turn!

                    Comment

                    Working...
                    X