Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stripplot over multiple categories

    Dear Stata-users,

    Using Stata18 on Windows 10, I would like to create a descriptive stripplot display of clustered data both over and by (two layers of) categories. I want to represent each datapoint as a dot, overlayed by a box plot.

    There are 21 units of observation ("id"), numerically encoded 1 to 21. A second variable "idstr" is the same but encoded as string type data. There are 4 observations of the Y variable for each id. The variable "type" is the exposure of interest over which to compare. There are three categories of "type", numerically encoded as 0/1/2.

    The data is in long form. Dataex, code and noted problems to solve below.

    Graph 2 and combined Graphs 7-9 are the closest to what I need.

    I would be grateful for any insights/ suggestions. Thankyou!

    Janine

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str2 idstr byte sub_id double Y byte type
     1 "1"  1  .07488189048603522 2
     1 "1"  2   .5067875067108304 2
     1 "1"  3  .29499822792932273 2
     1 "1"  4 .018557113586428975 2
     2 "2"  1   .4113476486861821 0
     2 "2"  2   .9195005728815538 0
     2 "2"  3   .5432439113061359 0
     2 "2"  4  .41028091230066566 0
     3 "3"  1   .9072961712824079 0
     3 "3"  2   .8448987600966915 0
     3 "3"  3  .26780287305047035 0
     3 "3"  4  .27379354098684305 0
     4 "4"  1    .966479198040105 0
     4 "4"  2   .6341329194538046 0
     4 "4"  3   .8212810231346063 0
     4 "4"  4  .23324948596647577 0
     5 "5"  1  .07873996407893169 1
     5 "5"  2    .677842755864262 1
     5 "5"  3  .11788779304927188 1
     5 "5"  4   .9637632250605132 1
     6 "6"  1  .08891648492100579 1
     6 "6"  2   .7266541288236086 1
     6 "6"  3   .5956550213456991 1
     6 "6"  4  .11259534194901744 1
     7 "7"  1  .48240122063694946 1
     7 "7"  2  .34795262613972267 1
     7 "7"  3   .8433304450260144 1
     7 "7"  4   .4317569130636417 1
     8 "8"  1   .2937572250893631 2
     8 "8"  2   .9510877785738538 2
     8 "8"  3   .7105466649350736 2
     8 "8"  4 .005007799058933338 2
     9 "9"  1   .6042019611288234 1
     9 "9"  2  .11513986519583042 1
     9 "9"  3   .1983651475242354 1
     9 "9"  4  .06401639069526355 1
    10 "10" 1   .7604807247137322 1
    10 "10" 2   .5340250780305599 1
    10 "10" 3   .7143345464844253 1
    10 "10" 4   .9449475199789127 1
    11 "11" 1   .5347245697100783 1
    11 "11" 2   .8283370474700306 1
    11 "11" 3   .4127192172926123 1
    11 "11" 4    .958545345904852 1
    12 "12" 1    .838533152932602 1
    12 "12" 2   .2455713597533643 1
    12 "12" 3  .44871007570076915 1
    12 "12" 4   .9175279017503324 1
    13 "13" 1   .5574824506486208 1
    13 "13" 2   .4238900581338134 1
    13 "13" 3  .22908874233425924 1
    13 "13" 4   .9974159465045283 1
    14 "14" 1  .05477617525407785 2
    14 "14" 2   .3230299731492071 2
    14 "14" 3  .17531699588470107 2
    14 "14" 4   .6094178721405584 2
    15 "15" 1   .6385848074664882 2
    15 "15" 2   .8899438910415776 2
    15 "15" 3  .44820822217793554 2
    15 "15" 4   .2629934481093047 2
    16 "16" 1   .6043580459007394 2
    16 "16" 2   .2597256465301314 2
    16 "16" 3  .37404740302320694 2
    16 "16" 4   .1885804343837293 2
    17 "17" 1  .03545208904574093 2
    17 "17" 2   .0817467913998039 2
    17 "17" 3   .8439653489601434 2
    17 "17" 4  .04956033989221853 2
    18 "18" 1   .3722326177886752 2
    18 "18" 2   .9846106419931462 2
    18 "18" 3  .03563558815406753 2
    18 "18" 4   .5738691024526158 2
    19 "19" 1   .3330087987974606 2
    19 "19" 2   .9444268942414267 2
    19 "19" 3   .2474485598948324 2
    19 "19" 4   .7479053152466856 2
    20 "20" 1    .397547867876096 0
    20 "20" 2  .44053960989013097 0
    20 "20" 3   .2801748489806747 0
    20 "20" 4  .34413739004932964 0
    21 "21" 1  .19849761919324282 2
    21 "21" 2   .9173630043593013 2
    21 "21" 3   .8280002046875402 2
    21 "21" 4    .285534958188877 2
    end
    // compare type groups for Y (not taking into account clustering): Graph 1
    stripplot Y, over(type) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g1, replace)

    // single compact graph of Y data over "id" by "type": Graph 2
    stripplot Y, over(id) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g2, replace)

    /* Problem:

    1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

    */

    // single compact graph of Y data over "idstr" by "type": Graph 3
    stripplot Y, over(idstr) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g3, replace)

    /* Problems:

    1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

    2. the x-axis labels 1 to 21 are not ordered from 1 to 21

    */

    // combine three graphs for each "type" over "idstr" : combined Graphs 4-6
    stripplot Y if type == 0, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g4, replace)

    stripplot Y if type == 1, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g5, replace)


    stripplot Y if type == 2, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g6, replace)

    graph combine g4 g5 g6, row(1)


    /* Problems:

    1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

    2. The graphs are separate rather than compact

    3. the graphs are scaled inconsistently

    */

    // combine three graphs for each "type" over "idstr" : combined Graphs 7-9
    stripplot Y if type == 0, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g7, replace)

    stripplot Y if type == 1, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g8, replace)


    stripplot Y if type == 2, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g9, replace)

    graph combine g7 g8 g9, row(1)


    /* Problems:

    1. the x-axis labels are not ordered numerically

    2. The graphs are separate rather than compact

    3. the graph with the least id categories is scaled inconsistently compared to the graphs with more numerous id categories

    4. the x title displays the incorrect title "id" instead of "idstr"

    */

  • #2
    That's a very complete problem statement, apart from our request to explain that stripplot is community-contributed from SSC.

    I am going to re-number the problems you identify as JS1, JS2, and so forth, and add commentary.

    Code:
     
     // compare type groups for Y (not taking into account clustering): Graph 1  stripplot Y, over(type) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g1, replace)
    NJC: That does more or less what was intended, except for not showing identifiers separately.

    Code:
     
    // single compact graph of Y data over "id" by "type": Graph 2
    stripplot Y, over(id) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g2, replace)

    Problem JS1: The x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

    NJC: Correct. That is generic for the by() option of graph.


    Code:
    // single compact graph of Y data over "idstr" by "type": Graph 3
    stripplot Y, over(idstr) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g3, replace)
    Problem JS2 = Problem JS1

    Problem JS3. the x-axis labels 1 to 21 are not ordered from 1 to 21

    NJC: Correct, but this is self-inflicted. Your x axis variable is string, and so sorts yield dictionary order, 1 10 .. 19 2 20 21 3 4 5 6 7 8 9. If you wanted that order, you should use idstr, but not otherwise.


    Code:
    // combine three graphs for each "type" over "idstr" : combined Graphs 4-6
    stripplot Y if type == 0, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g4, replace)
    
    stripplot Y if type == 1, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g5, replace)
    
    
    stripplot Y if type == 2, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g6, replace)
    
    graph combine g4 g5 g6, row(1)
    Problem JS4 = Problem JS1

    Problem JS5: The graphs are separate rather than compact

    Problem JS6: the graphs are scaled inconsistently

    NJC: Indeed, it's a mess. But stripplot is going to take your identifier literally, i.e. numerically. That is what twoway graphs generally do. stripplot could offer an option to map to successive integers. but that can be done any way before you call the command. Please see below.

    Also, graph combine doesn't try to be smart about aligning graphs according to their content unless you specify extra options.

    Code:
    // combine three graphs for each "type" over "idstr" : combined Graphs 7-9
    stripplot Y if type == 0, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g7, replace)
    
    stripplot Y if type == 1, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g8, replace)
    
    
    stripplot Y if type == 2, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g9, replace)
    
    graph combine g7 g8 g9, row(1)
    Problem JS7 the x-axis labels are not ordered numerically

    NJC This is Problem JS3 again. I am not clear why you are flipping between id and idstr but there is no advantage to using the string version.

    Problem JS8 = Problem JS5

    Problem JS9 the graph with the least id categories is scaled inconsistently compared to the graphs with more numerous id categories

    NJC: Same answers as before. graph combine can't help with the problems of the data structure.

    Problem JS10. the x title displays the incorrect title "id" instead of "idstr"

    NJC: Not so with your data example. If this is happening with your full data, it's because idstr has the variable label id.

    Backing up, I think stripplot can do a better job for you.

    My guesses and prejudices:

    1. The main problem is that you don't have the same identifiers, or even the same number of identifiers, for each type. This seems natural enough, but stripplot can't produce a tidy plot using by() for the outer group. This is generic to the by() option, which is just standard twoway code.

    2. idstr is just a distraction here.

    3. I guess that type is something of real interest. These data have an air of something medical or clinical.

    4. I guess that identifier is not of real interest other than indicating a different individual. So, taking it literally is not necessary, and indeed that turns out to be a bad idea. Even preserving the order is not necessary and (anticipating results to come) not even a good idea.

    5. Box plots seem possibly useful if you are comparing types, but over the top for groups of 4 values.

    6. Jittering is allowed but I dislike it cordially in this context. I don't think people are good at counting over local random patterns to get local densities, or at least I don't want even to try when a plot with a different design will do it for you. If you have lots of ties or near-ties, stacking of some kind works much better.

    I think the key is that you need to do some work beforehand to define a better grouping of the data.

    The first idea is to nest identifiers within type. We can copy the original values as value labels using labmask from the Stata Journal.

    Code:
    sort type id  
    gen newid = sum(type != type[_n-1]) + sum(id != id[_n-1]) - 1 
    labmask newid, values(id)
    The trick of jumping by 1 every time we see a new type and by 1 every time we see a new idenfifier ensures a gap between types. That is, we are going to exploit the fact that stripplot takes the x axis variable literally.

    The first plot has something of the flavour of yours. I won't add boxes. I do add reference lines at each median. If another summary measure makes more sense go for it.

    Code:
    stripplot Y, over(newid) separate(type) mc(stc1 stc2 stc3) ms(Oh Th Dh) vertical xaxis(1 2) xla(2.5 "Type 0" 9.5 "Type 1" 19 "Type 2", tlc(none) axis(2)) xli(5 14, lp(solid)) legend(off) xtitle("", axis(2)) refline(lc(magenta) lw(medthick)) reflinestretch(0.4) reflevel(median) xtitle("", axis(1)) name(NJC1, replace) ytitle(Whatever this is)
    Click image for larger version

Name:	stubbs1.png
Views:	1
Size:	73.4 KB
ID:	1742921


    But why should we take identifier seriously? Why not sort on say those medians? (If another sort order makes more sense, go for it.)

    Code:
    egen median = median(Y), by(id)
    sort type median id 
    gen newid2 = sum(type != type[_n-1]) + sum(id != id[_n-1]) - 1 
    labmask newid2, values(id)
    
    stripplot Y, over(newid2) separate(type) mc(stc1 stc2 stc3) ms(Oh Th Dh) vertical xaxis(1 2) xla(2.5 "Type 0" 9.5 "Type 1" 19 "Type 2", tlc(none) axis(2)) xli(5 14, lp(solid)) legend(off) xtitle("", axis(2)) refline(lc(magenta) lw(medthick))  reflinestretch(0.4) reflevel(median) xtitle("", axis(1))name(NJC2, replace) ytitle(Whatever this is)
    Click image for larger version

Name:	stubbs2.png
Views:	1
Size:	73.3 KB
ID:	1742922


    I like that one better.

    Box plots could be helpful for the types lumped together, but I prefer quantile plots alongside (distribution function plots, if you like, but with axes flipped compared with convention). Now there is space for different reference lines. I chose means, but anything for which an egen function exists is fair game.

    Code:
    stripplot Y, by(type, row(1) t1title(Tt1title(type) ype)  note("")) centre cumul cumprob vertical box(barw(0.04)) boffset(-0.48) pctile(0) refline name(NJC3, replace) ytitle(Whatever this is)
    Click image for larger version

Name:	stubbs3.png
Views:	1
Size:	41.8 KB
ID:	1742923

    Comment


    • #3
      Thankyou very much Nick for the speedy and comprehensive reply. Your solutions and thoughts are very helpful. Indeed, this is real (mock) data, of the ecological variety. Thanks, Janine

      Comment


      • #4
        I am interested in ecology and would like to hear how designs turn out with your real data.

        Faking by() for panels of unequal width comes up from time to time and is perhaps something i should write up more systematically.

        The first two graphs in #2 would look better with axes on all four sides.

        Comment


        • #5
          Here is another take which came to me in a Doh! moment.

          The minor trickery is getting graph dot to plot replicates as separate variables after a reshape wide. The command doesn't care if we stipulate the same marker properties. Endless scope exists otherwise to tweak marker properties, grid line pattern, position of categorical axis, and so forth.

          Note that vertical is an undocumented option. If you prefer the default, just leave it out. The default would be essential for lengthy categorical labels.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte id str2 idstr byte sub_id double Y byte type
           1 "1"  1  .07488189048603522 2
           1 "1"  2   .5067875067108304 2
           1 "1"  3  .29499822792932273 2
           1 "1"  4 .018557113586428975 2
           2 "2"  1   .4113476486861821 0
           2 "2"  2   .9195005728815538 0
           2 "2"  3   .5432439113061359 0
           2 "2"  4  .41028091230066566 0
           3 "3"  1   .9072961712824079 0
           3 "3"  2   .8448987600966915 0
           3 "3"  3  .26780287305047035 0
           3 "3"  4  .27379354098684305 0
           4 "4"  1    .966479198040105 0
           4 "4"  2   .6341329194538046 0
           4 "4"  3   .8212810231346063 0
           4 "4"  4  .23324948596647577 0
           5 "5"  1  .07873996407893169 1
           5 "5"  2    .677842755864262 1
           5 "5"  3  .11788779304927188 1
           5 "5"  4   .9637632250605132 1
           6 "6"  1  .08891648492100579 1
           6 "6"  2   .7266541288236086 1
           6 "6"  3   .5956550213456991 1
           6 "6"  4  .11259534194901744 1
           7 "7"  1  .48240122063694946 1
           7 "7"  2  .34795262613972267 1
           7 "7"  3   .8433304450260144 1
           7 "7"  4   .4317569130636417 1
           8 "8"  1   .2937572250893631 2
           8 "8"  2   .9510877785738538 2
           8 "8"  3   .7105466649350736 2
           8 "8"  4 .005007799058933338 2
           9 "9"  1   .6042019611288234 1
           9 "9"  2  .11513986519583042 1
           9 "9"  3   .1983651475242354 1
           9 "9"  4  .06401639069526355 1
          10 "10" 1   .7604807247137322 1
          10 "10" 2   .5340250780305599 1
          10 "10" 3   .7143345464844253 1
          10 "10" 4   .9449475199789127 1
          11 "11" 1   .5347245697100783 1
          11 "11" 2   .8283370474700306 1
          11 "11" 3   .4127192172926123 1
          11 "11" 4    .958545345904852 1
          12 "12" 1    .838533152932602 1
          12 "12" 2   .2455713597533643 1
          12 "12" 3  .44871007570076915 1
          12 "12" 4   .9175279017503324 1
          13 "13" 1   .5574824506486208 1
          13 "13" 2   .4238900581338134 1
          13 "13" 3  .22908874233425924 1
          13 "13" 4   .9974159465045283 1
          14 "14" 1  .05477617525407785 2
          14 "14" 2   .3230299731492071 2
          14 "14" 3  .17531699588470107 2
          14 "14" 4   .6094178721405584 2
          15 "15" 1   .6385848074664882 2
          15 "15" 2   .8899438910415776 2
          15 "15" 3  .44820822217793554 2
          15 "15" 4   .2629934481093047 2
          16 "16" 1   .6043580459007394 2
          16 "16" 2   .2597256465301314 2
          16 "16" 3  .37404740302320694 2
          16 "16" 4   .1885804343837293 2
          17 "17" 1  .03545208904574093 2
          17 "17" 2   .0817467913998039 2
          17 "17" 3   .8439653489601434 2
          17 "17" 4  .04956033989221853 2
          18 "18" 1   .3722326177886752 2
          18 "18" 2   .9846106419931462 2
          18 "18" 3  .03563558815406753 2
          18 "18" 4   .5738691024526158 2
          19 "19" 1   .3330087987974606 2
          19 "19" 2   .9444268942414267 2
          19 "19" 3   .2474485598948324 2
          19 "19" 4   .7479053152466856 2
          20 "20" 1    .397547867876096 0
          20 "20" 2  .44053960989013097 0
          20 "20" 3   .2801748489806747 0
          20 "20" 4  .34413739004932964 0
          21 "21" 1  .19849761919324282 2
          21 "21" 2   .9173630043593013 2
          21 "21" 3   .8280002046875402 2
          21 "21" 4    .285534958188877 2
          end
          
          reshape wide Y, i(id) j(sub_id) 
          egen median = rowmedian(Y?)
          
          forval j = 1/4 { 
              local call `call' marker(`j', ms(O) mc(stc1) msize(medsmall))
          }
          
          graph dot (asis) Y? median, over(id, sort(median)) over(type) nofill vertical `call' marker(5, ms(Dh) msize(medlarge)) legend(order(5)) ytitle(whatever this is) b2title(Type and identifier)
          Click image for larger version

Name:	whatever2.png
Views:	1
Size:	77.9 KB
ID:	1743200

          Comment


          • #6
            That is also very nice. Thanks Nick!

            Janine

            Comment


            • #7
              The paper imagined in #4 is now written but is unlikely to appear for some months. (My writing-up times vary from days to decades...).

              I've lost interest in your data example on learning that it is mock data, but the problem of showing unequal groups well remains. Here I use some data of B.S. Everitt on weights of anorexic girls before and after treatment as previously discussed in https://journals.sagepub.com/doi/pdf...867X0900900408

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int treatment float(before after)
              1 80.5  82.2
              1 84.9  85.6
              1 81.5  81.4
              1 82.6  81.9
              1 79.9  76.4
              1 88.7 103.6
              1 94.9  98.4
              1 76.3  93.4
              1   81  73.4
              1 80.5  82.1
              1   85  96.7
              1 89.2  95.3
              1 81.3  82.4
              1 76.5  72.5
              1   70  90.9
              1 80.4  71.3
              1 83.3  85.4
              1   83  81.6
              1 87.7  89.1
              1 84.2  83.9
              1 86.4  82.7
              1 76.5  75.7
              1 80.2  82.6
              1 87.8 100.4
              1 83.3  85.2
              1 79.7  83.6
              1 84.5  84.6
              1 80.8  96.2
              1 87.4  86.7
              2 80.7  80.2
              2 89.4  80.1
              2 91.8  86.4
              2   74  86.3
              2 78.1  76.1
              2 88.3  78.1
              2 87.3  75.1
              2 75.1  86.7
              2 80.6  73.5
              2 78.4  84.6
              2 77.6  77.4
              2 88.7  79.5
              2 81.3  89.6
              2 78.1  81.4
              2 70.5  81.8
              2 77.3  77.3
              2 85.2  84.2
              2   86  75.4
              2 84.1  79.5
              2 79.7    73
              2 85.5  88.3
              2 84.4  84.7
              2 79.6  81.4
              2 77.5  81.2
              2 72.3  88.2
              2   89  78.8
              3 83.8  95.2
              3 83.3  94.3
              3   86  91.5
              3 82.5  91.9
              3 86.7 100.3
              3 79.6  76.7
              3 76.9  76.8
              3 94.2 101.6
              3 73.4  94.9
              3 80.5  75.2
              3 81.6  77.8
              3 82.1  95.5
              3 77.6  90.7
              3 83.5  92.5
              3 89.9  93.8
              3   86  91.7
              3 87.3    98
              end
              label values treatment treatment
              label def treatment 1 "cognitive", modify
              label def treatment 2 "control", modify
              label def treatment 3 "family", modify
              
              
              
              gen weight_change = after - before 
              label var weight_change "weight change after treatment (ib)"
              
              egen median = median(weight_change), by(treatment)
              
              bysort median treatment (weight_change) : gen rank = _n 
              
              gen axis = sum(3 *(treatment != treatment[_n-1])) + sum(rank != rank[_n-1])
              
              forval g = 1/3 { 
                  su axis if treatment == `g', meanonly 
                  local pos`g' = r(mean)
                  local line`g' = r(max) + 2
                  local text`g' : label (treatment) `g' 
              }
              
              separate median, by(treatment) veryshortlabel 
              
              scatter weight_change axis,  xla(`pos1' "`text1'" `pos2' "`text2'" `pos3' "`text3'", tlc(none)) xli(`line1' `line2', lp(solid)) xtitle("") ytitle("`: var label weight_change'") || line median? axis , lc(stc1 ..) legend(off) note(medians shown)
              Click image for larger version

Name:	anorexic.png
Views:	1
Size:	35.0 KB
ID:	1744186


              The techniques here include:

              Calculating medians for three groups. Using those to sort groups. Eventually plotting those medians as horizontal lines.

              Constructing an axis variable based on treatments in median order and on sort order of outcome within treatments.

              Working out where to put axis labels and separator lines. The aim is to get close to the style of by() without using by() at all. The motive for doing that is to respect unequal group sizes.

              Comment


              • #8
                ib is a typo for lb (pounds).

                Comment


                • #9
                  Nick,

                  I look forward to your paper on #4

                  Regards,

                  Janine

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Here is another take which came to me in a Doh! moment.
                    Wondering what a Doh! moment is, I found this Wikipedia entry. It seems that this is what you meant. Being attentive to (historical) details, you should perhaps have used the apostrophe as in "D'oh! moment".

                    All this (#2 ff) is very illuminating, thank you!

                    Comment


                    • #11
                      Thanks for the thanks!

                      I am doubly exposed here: the Doh reference can be traced to my occasional encounters with the Simpsons series and also shows I am mangling the word in question. The intended flavour is just: I was stupid not to think of that earlier.

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        The paper imagined in #4 is now written but is unlikely to appear for some months. (My writing-up times vary from days to decades...).

                        I've lost interest in your data example on learning that it is mock data, but the problem of showing unequal groups well remains. Here I use some data of B.S. Everitt on weights of anorexic girls before and after treatment as previously discussed in https://journals.sagepub.com/doi/pdf...867X0900900408

                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input int treatment float(before after)
                        1 80.5 82.2
                        1 84.9 85.6
                        1 81.5 81.4
                        1 82.6 81.9
                        1 79.9 76.4
                        1 88.7 103.6
                        1 94.9 98.4
                        1 76.3 93.4
                        1 81 73.4
                        1 80.5 82.1
                        1 85 96.7
                        1 89.2 95.3
                        1 81.3 82.4
                        1 76.5 72.5
                        1 70 90.9
                        1 80.4 71.3
                        1 83.3 85.4
                        1 83 81.6
                        1 87.7 89.1
                        1 84.2 83.9
                        1 86.4 82.7
                        1 76.5 75.7
                        1 80.2 82.6
                        1 87.8 100.4
                        1 83.3 85.2
                        1 79.7 83.6
                        1 84.5 84.6
                        1 80.8 96.2
                        1 87.4 86.7
                        2 80.7 80.2
                        2 89.4 80.1
                        2 91.8 86.4
                        2 74 86.3
                        2 78.1 76.1
                        2 88.3 78.1
                        2 87.3 75.1
                        2 75.1 86.7
                        2 80.6 73.5
                        2 78.4 84.6
                        2 77.6 77.4
                        2 88.7 79.5
                        2 81.3 89.6
                        2 78.1 81.4
                        2 70.5 81.8
                        2 77.3 77.3
                        2 85.2 84.2
                        2 86 75.4
                        2 84.1 79.5
                        2 79.7 73
                        2 85.5 88.3
                        2 84.4 84.7
                        2 79.6 81.4
                        2 77.5 81.2
                        2 72.3 88.2
                        2 89 78.8
                        3 83.8 95.2
                        3 83.3 94.3
                        3 86 91.5
                        3 82.5 91.9
                        3 86.7 100.3
                        3 79.6 76.7
                        3 76.9 76.8
                        3 94.2 101.6
                        3 73.4 94.9
                        3 80.5 75.2
                        3 81.6 77.8
                        3 82.1 95.5
                        3 77.6 90.7
                        3 83.5 92.5
                        3 89.9 93.8
                        3 86 91.7
                        3 87.3 98
                        end
                        label values treatment treatment
                        label def treatment 1 "cognitive", modify
                        label def treatment 2 "control", modify
                        label def treatment 3 "family", modify
                        
                        
                        
                        gen weight_change = after - before
                        label var weight_change "weight change after treatment (ib)"
                        
                        egen median = median(weight_change), by(treatment)
                        
                        bysort median treatment (weight_change) : gen rank = _n
                        
                        gen axis = sum(3 *(treatment != treatment[_n-1])) + sum(rank != rank[_n-1])
                        
                        forval g = 1/3 {
                        su axis if treatment == `g', meanonly
                        local pos`g' = r(mean)
                        local line`g' = r(max) + 2
                        local text`g' : label (treatment) `g'
                        }
                        
                        separate median, by(treatment) veryshortlabel
                        
                        scatter weight_change axis, xla(`pos1' "`text1'" `pos2' "`text2'" `pos3' "`text3'", tlc(none)) xli(`line1' `line2', lp(solid)) xtitle("") ytitle("`: var label weight_change'") || line median? axis , lc(stc1 ..) legend(off) note(medians shown)
                        [ATTACH=CONFIG]n1744186[/ATTACH]

                        The techniques here include:

                        Calculating medians for three groups. Using those to sort groups. Eventually plotting those medians as horizontal lines.

                        Constructing an axis variable based on treatments in median order and on sort order of outcome within treatments.

                        Working out where to put axis labels and separator lines. The aim is to get close to the style of by() without using by() at all. The motive for doing that is to respect unequal group sizes.
                        This is very interesting! Is it possible to change the color and shape of the bubbles in #7.

                        Comment


                        • #13
                          Naturally. Just use standard twoway scatter options as you wish.

                          Comment

                          Working...
                          X