Stripplot over multiple categories

Janine Stubbs

Join Date: May 2021

Posts: 34
#1

Stripplot over multiple categories

11 Feb 2024, 17:55

Dear Stata-users,

Using Stata18 on Windows 10, I would like to create a descriptive stripplot display of clustered data both over and by (two layers of) categories. I want to represent each datapoint as a dot, overlayed by a box plot.

There are 21 units of observation ("id"), numerically encoded 1 to 21. A second variable "idstr" is the same but encoded as string type data. There are 4 observations of the Y variable for each id. The variable "type" is the exposure of interest over which to compare. There are three categories of "type", numerically encoded as 0/1/2.

The data is in long form. Dataex, code and noted problems to solve below.

Graph 2 and combined Graphs 7-9 are the closest to what I need.

I would be grateful for any insights/ suggestions. Thankyou!

Janine

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte id str2 idstr byte sub_id double Y byte type 1 "1" 1 .07488189048603522 2 1 "1" 2 .5067875067108304 2 1 "1" 3 .29499822792932273 2 1 "1" 4 .018557113586428975 2 2 "2" 1 .4113476486861821 0 2 "2" 2 .9195005728815538 0 2 "2" 3 .5432439113061359 0 2 "2" 4 .41028091230066566 0 3 "3" 1 .9072961712824079 0 3 "3" 2 .8448987600966915 0 3 "3" 3 .26780287305047035 0 3 "3" 4 .27379354098684305 0 4 "4" 1 .966479198040105 0 4 "4" 2 .6341329194538046 0 4 "4" 3 .8212810231346063 0 4 "4" 4 .23324948596647577 0 5 "5" 1 .07873996407893169 1 5 "5" 2 .677842755864262 1 5 "5" 3 .11788779304927188 1 5 "5" 4 .9637632250605132 1 6 "6" 1 .08891648492100579 1 6 "6" 2 .7266541288236086 1 6 "6" 3 .5956550213456991 1 6 "6" 4 .11259534194901744 1 7 "7" 1 .48240122063694946 1 7 "7" 2 .34795262613972267 1 7 "7" 3 .8433304450260144 1 7 "7" 4 .4317569130636417 1 8 "8" 1 .2937572250893631 2 8 "8" 2 .9510877785738538 2 8 "8" 3 .7105466649350736 2 8 "8" 4 .005007799058933338 2 9 "9" 1 .6042019611288234 1 9 "9" 2 .11513986519583042 1 9 "9" 3 .1983651475242354 1 9 "9" 4 .06401639069526355 1 10 "10" 1 .7604807247137322 1 10 "10" 2 .5340250780305599 1 10 "10" 3 .7143345464844253 1 10 "10" 4 .9449475199789127 1 11 "11" 1 .5347245697100783 1 11 "11" 2 .8283370474700306 1 11 "11" 3 .4127192172926123 1 11 "11" 4 .958545345904852 1 12 "12" 1 .838533152932602 1 12 "12" 2 .2455713597533643 1 12 "12" 3 .44871007570076915 1 12 "12" 4 .9175279017503324 1 13 "13" 1 .5574824506486208 1 13 "13" 2 .4238900581338134 1 13 "13" 3 .22908874233425924 1 13 "13" 4 .9974159465045283 1 14 "14" 1 .05477617525407785 2 14 "14" 2 .3230299731492071 2 14 "14" 3 .17531699588470107 2 14 "14" 4 .6094178721405584 2 15 "15" 1 .6385848074664882 2 15 "15" 2 .8899438910415776 2 15 "15" 3 .44820822217793554 2 15 "15" 4 .2629934481093047 2 16 "16" 1 .6043580459007394 2 16 "16" 2 .2597256465301314 2 16 "16" 3 .37404740302320694 2 16 "16" 4 .1885804343837293 2 17 "17" 1 .03545208904574093 2 17 "17" 2 .0817467913998039 2 17 "17" 3 .8439653489601434 2 17 "17" 4 .04956033989221853 2 18 "18" 1 .3722326177886752 2 18 "18" 2 .9846106419931462 2 18 "18" 3 .03563558815406753 2 18 "18" 4 .5738691024526158 2 19 "19" 1 .3330087987974606 2 19 "19" 2 .9444268942414267 2 19 "19" 3 .2474485598948324 2 19 "19" 4 .7479053152466856 2 20 "20" 1 .397547867876096 0 20 "20" 2 .44053960989013097 0 20 "20" 3 .2801748489806747 0 20 "20" 4 .34413739004932964 0 21 "21" 1 .19849761919324282 2 21 "21" 2 .9173630043593013 2 21 "21" 3 .8280002046875402 2 21 "21" 4 .285534958188877 2 end

// compare type groups for Y (not taking into account clustering): Graph 1
stripplot Y, over(type) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g1, replace)

// single compact graph of Y data over "id" by "type": Graph 2
stripplot Y, over(id) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g2, replace)

/* Problem:

1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

*/

// single compact graph of Y data over "idstr" by "type": Graph 3
stripplot Y, over(idstr) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g3, replace)

/* Problems:

1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

2. the x-axis labels 1 to 21 are not ordered from 1 to 21

*/

// combine three graphs for each "type" over "idstr" : combined Graphs 4-6
stripplot Y if type == 0, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g4, replace)

stripplot Y if type == 1, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g5, replace)

stripplot Y if type == 2, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g6, replace)

graph combine g4 g5 g6, row(1)

/* Problems:

1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

2. The graphs are separate rather than compact

3. the graphs are scaled inconsistently

*/

// combine three graphs for each "type" over "idstr" : combined Graphs 7-9
stripplot Y if type == 0, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g7, replace)

stripplot Y if type == 1, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g8, replace)

stripplot Y if type == 2, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g9, replace)

graph combine g7 g8 g9, row(1)

/* Problems:

1. the x-axis labels are not ordered numerically

2. The graphs are separate rather than compact

3. the graph with the least id categories is scaled inconsistently compared to the graphs with more numerous id categories

4. the x title displays the incorrect title "id" instead of "idstr"

*/
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35432
#2

12 Feb 2024, 05:18

That's a very complete problem statement, apart from our request to explain that stripplot is community-contributed from SSC.

I am going to re-number the problems you identify as JS1, JS2, and so forth, and add commentary.

Code:

// compare type groups for Y (not taking into account clustering): Graph 1 stripplot Y, over(type) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g1, replace)

NJC: That does more or less what was intended, except for not showing identifiers separately.

Code:

// single compact graph of Y data over "id" by "type": Graph 2 stripplot Y, over(id) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g2, replace)

Problem JS1: The x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories

NJC: Correct. That is generic for the by() option of graph.

Code:

// single compact graph of Y data over "idstr" by "type": Graph 3 stripplot Y, over(idstr) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g3, replace)

Problem JS2 = Problem JS1

Problem JS3. the x-axis labels 1 to 21 are not ordered from 1 to 21

NJC: Correct, but this is self-inflicted. Your x axis variable is string, and so sorts yield dictionary order, 1 10 .. 19 2 20 21 3 4 5 6 7 8 9. If you wanted that order, you should use idstr, but not otherwise.

Code:

// combine three graphs for each "type" over "idstr" : combined Graphs 4-6 stripplot Y if type == 0, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g4, replace) stripplot Y if type == 1, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g5, replace) stripplot Y if type == 2, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g6, replace) graph combine g4 g5 g6, row(1)

Problem JS4 = Problem JS1

Problem JS5: The graphs are separate rather than compact

Problem JS6: the graphs are scaled inconsistently

NJC: Indeed, it's a mess. But stripplot is going to take your identifier literally, i.e. numerically. That is what twoway graphs generally do. stripplot could offer an option to map to successive integers. but that can be done any way before you call the command. Please see below.

Also, graph combine doesn't try to be smart about aligning graphs according to their content unless you specify extra options.

Code:

// combine three graphs for each "type" over "idstr" : combined Graphs 7-9 stripplot Y if type == 0, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g7, replace) stripplot Y if type == 1, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g8, replace) stripplot Y if type == 2, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g9, replace) graph combine g7 g8 g9, row(1)

Problem JS7 the x-axis labels are not ordered numerically

NJC This is Problem JS3 again. I am not clear why you are flipping between id and idstr but there is no advantage to using the string version.

Problem JS8 = Problem JS5

Problem JS9 the graph with the least id categories is scaled inconsistently compared to the graphs with more numerous id categories

NJC: Same answers as before. graph combine can't help with the problems of the data structure.

Problem JS10. the x title displays the incorrect title "id" instead of "idstr"

NJC: Not so with your data example. If this is happening with your full data, it's because idstr has the variable label id.

Backing up, I think stripplot can do a better job for you.

My guesses and prejudices:

1. The main problem is that you don't have the same identifiers, or even the same number of identifiers, for each type. This seems natural enough, but stripplot can't produce a tidy plot using by() for the outer group. This is generic to the by() option, which is just standard twoway code.

2. idstr is just a distraction here.

3. I guess that type is something of real interest. These data have an air of something medical or clinical.

4. I guess that identifier is not of real interest other than indicating a different individual. So, taking it literally is not necessary, and indeed that turns out to be a bad idea. Even preserving the order is not necessary and (anticipating results to come) not even a good idea.

5. Box plots seem possibly useful if you are comparing types, but over the top for groups of 4 values.

6. Jittering is allowed but I dislike it cordially in this context. I don't think people are good at counting over local random patterns to get local densities, or at least I don't want even to try when a plot with a different design will do it for you. If you have lots of ties or near-ties, stacking of some kind works much better.

I think the key is that you need to do some work beforehand to define a better grouping of the data.

The first idea is to nest identifiers within type. We can copy the original values as value labels using labmask from the Stata Journal.

Code:

sort type id gen newid = sum(type != type[_n-1]) + sum(id != id[_n-1]) - 1 labmask newid, values(id)

The trick of jumping by 1 every time we see a new type and by 1 every time we see a new idenfifier ensures a gap between types. That is, we are going to exploit the fact that stripplot takes the x axis variable literally.

The first plot has something of the flavour of yours. I won't add boxes. I do add reference lines at each median. If another summary measure makes more sense go for it.

Code:

stripplot Y, over(newid) separate(type) mc(stc1 stc2 stc3) ms(Oh Th Dh) vertical xaxis(1 2) xla(2.5 "Type 0" 9.5 "Type 1" 19 "Type 2", tlc(none) axis(2)) xli(5 14, lp(solid)) legend(off) xtitle("", axis(2)) refline(lc(magenta) lw(medthick)) reflinestretch(0.4) reflevel(median) xtitle("", axis(1)) name(NJC1, replace) ytitle(Whatever this is)

But why should we take identifier seriously? Why not sort on say those medians? (If another sort order makes more sense, go for it.)

Code:

egen median = median(Y), by(id) sort type median id gen newid2 = sum(type != type[_n-1]) + sum(id != id[_n-1]) - 1 labmask newid2, values(id) stripplot Y, over(newid2) separate(type) mc(stc1 stc2 stc3) ms(Oh Th Dh) vertical xaxis(1 2) xla(2.5 "Type 0" 9.5 "Type 1" 19 "Type 2", tlc(none) axis(2)) xli(5 14, lp(solid)) legend(off) xtitle("", axis(2)) refline(lc(magenta) lw(medthick)) reflinestretch(0.4) reflevel(median) xtitle("", axis(1))name(NJC2, replace) ytitle(Whatever this is)

I like that one better.

Box plots could be helpful for the types lumped together, but I prefer quantile plots alongside (distribution function plots, if you like, but with axes flipped compared with convention). Now there is space for different reference lines. I chose means, but anything for which an egen function exists is fair game.

Code:

stripplot Y, by(type, row(1) t1title(Tt1title(type) ype) note("")) centre cumul cumprob vertical box(barw(0.04)) boffset(-0.48) pctile(0) refline name(NJC3, replace) ytitle(Whatever this is)
Comment
Janine Stubbs

Join Date: May 2021

Posts: 34
#3

12 Feb 2024, 18:56

Thankyou very much Nick for the speedy and comprehensive reply. Your solutions and thoughts are very helpful. Indeed, this is real (mock) data, of the ecological variety. Thanks, Janine
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#4

13 Feb 2024, 03:07

I am interested in ecology and would like to hear how designs turn out with your real data.

Faking by() for panels of unequal width comes up from time to time and is perhaps something i should write up more systematically.

The first two graphs in #2 would look better with axes on all four sides.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

14 Feb 2024, 03:58

Here is another take which came to me in a Doh! moment.

The minor trickery is getting graph dot to plot replicates as separate variables after a reshape wide. The command doesn't care if we stipulate the same marker properties. Endless scope exists otherwise to tweak marker properties, grid line pattern, position of categorical axis, and so forth.

Note that vertical is an undocumented option. If you prefer the default, just leave it out. The default would be essential for lengthy categorical labels.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str2 idstr byte sub_id double Y byte type
 1 "1"  1  .07488189048603522 2
 1 "1"  2   .5067875067108304 2
 1 "1"  3  .29499822792932273 2
 1 "1"  4 .018557113586428975 2
 2 "2"  1   .4113476486861821 0
 2 "2"  2   .9195005728815538 0
 2 "2"  3   .5432439113061359 0
 2 "2"  4  .41028091230066566 0
 3 "3"  1   .9072961712824079 0
 3 "3"  2   .8448987600966915 0
 3 "3"  3  .26780287305047035 0
 3 "3"  4  .27379354098684305 0
 4 "4"  1    .966479198040105 0
 4 "4"  2   .6341329194538046 0
 4 "4"  3   .8212810231346063 0
 4 "4"  4  .23324948596647577 0
 5 "5"  1  .07873996407893169 1
 5 "5"  2    .677842755864262 1
 5 "5"  3  .11788779304927188 1
 5 "5"  4   .9637632250605132 1
 6 "6"  1  .08891648492100579 1
 6 "6"  2   .7266541288236086 1
 6 "6"  3   .5956550213456991 1
 6 "6"  4  .11259534194901744 1
 7 "7"  1  .48240122063694946 1
 7 "7"  2  .34795262613972267 1
 7 "7"  3   .8433304450260144 1
 7 "7"  4   .4317569130636417 1
 8 "8"  1   .2937572250893631 2
 8 "8"  2   .9510877785738538 2
 8 "8"  3   .7105466649350736 2
 8 "8"  4 .005007799058933338 2
 9 "9"  1   .6042019611288234 1
 9 "9"  2  .11513986519583042 1
 9 "9"  3   .1983651475242354 1
 9 "9"  4  .06401639069526355 1
10 "10" 1   .7604807247137322 1
10 "10" 2   .5340250780305599 1
10 "10" 3   .7143345464844253 1
10 "10" 4   .9449475199789127 1
11 "11" 1   .5347245697100783 1
11 "11" 2   .8283370474700306 1
11 "11" 3   .4127192172926123 1
11 "11" 4    .958545345904852 1
12 "12" 1    .838533152932602 1
12 "12" 2   .2455713597533643 1
12 "12" 3  .44871007570076915 1
12 "12" 4   .9175279017503324 1
13 "13" 1   .5574824506486208 1
13 "13" 2   .4238900581338134 1
13 "13" 3  .22908874233425924 1
13 "13" 4   .9974159465045283 1
14 "14" 1  .05477617525407785 2
14 "14" 2   .3230299731492071 2
14 "14" 3  .17531699588470107 2
14 "14" 4   .6094178721405584 2
15 "15" 1   .6385848074664882 2
15 "15" 2   .8899438910415776 2
15 "15" 3  .44820822217793554 2
15 "15" 4   .2629934481093047 2
16 "16" 1   .6043580459007394 2
16 "16" 2   .2597256465301314 2
16 "16" 3  .37404740302320694 2
16 "16" 4   .1885804343837293 2
17 "17" 1  .03545208904574093 2
17 "17" 2   .0817467913998039 2
17 "17" 3   .8439653489601434 2
17 "17" 4  .04956033989221853 2
18 "18" 1   .3722326177886752 2
18 "18" 2   .9846106419931462 2
18 "18" 3  .03563558815406753 2
18 "18" 4   .5738691024526158 2
19 "19" 1   .3330087987974606 2
19 "19" 2   .9444268942414267 2
19 "19" 3   .2474485598948324 2
19 "19" 4   .7479053152466856 2
20 "20" 1    .397547867876096 0
20 "20" 2  .44053960989013097 0
20 "20" 3   .2801748489806747 0
20 "20" 4  .34413739004932964 0
21 "21" 1  .19849761919324282 2
21 "21" 2   .9173630043593013 2
21 "21" 3   .8280002046875402 2
21 "21" 4    .285534958188877 2
end

reshape wide Y, i(id) j(sub_id) 
egen median = rowmedian(Y?)

forval j = 1/4 { 
    local call `call' marker(`j', ms(O) mc(stc1) msize(medsmall))
}

graph dot (asis) Y? median, over(id, sort(median)) over(type) nofill vertical `call' marker(5, ms(Dh) msize(medlarge)) legend(order(5)) ytitle(whatever this is) b2title(Type and identifier)

Click image for larger version

Name: whatever2.png
Views: 1
Size: 77.9 KB
ID: 1743200

Comment

Janine Stubbs

Join Date: May 2021

Posts: 34
#6

21 Feb 2024, 20:47

That is also very nice. Thanks Nick!

Janine
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

22 Feb 2024, 06:14

The paper imagined in #4 is now written but is unlikely to appear for some months. (My writing-up times vary from days to decades...).

I've lost interest in your data example on learning that it is mock data, but the problem of showing unequal groups well remains. Here I use some data of B.S. Everitt on weights of anorexic girls before and after treatment as previously discussed in https://journals.sagepub.com/doi/pdf...867X0900900408

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int treatment float(before after)
1 80.5  82.2
1 84.9  85.6
1 81.5  81.4
1 82.6  81.9
1 79.9  76.4
1 88.7 103.6
1 94.9  98.4
1 76.3  93.4
1   81  73.4
1 80.5  82.1
1   85  96.7
1 89.2  95.3
1 81.3  82.4
1 76.5  72.5
1   70  90.9
1 80.4  71.3
1 83.3  85.4
1   83  81.6
1 87.7  89.1
1 84.2  83.9
1 86.4  82.7
1 76.5  75.7
1 80.2  82.6
1 87.8 100.4
1 83.3  85.2
1 79.7  83.6
1 84.5  84.6
1 80.8  96.2
1 87.4  86.7
2 80.7  80.2
2 89.4  80.1
2 91.8  86.4
2   74  86.3
2 78.1  76.1
2 88.3  78.1
2 87.3  75.1
2 75.1  86.7
2 80.6  73.5
2 78.4  84.6
2 77.6  77.4
2 88.7  79.5
2 81.3  89.6
2 78.1  81.4
2 70.5  81.8
2 77.3  77.3
2 85.2  84.2
2   86  75.4
2 84.1  79.5
2 79.7    73
2 85.5  88.3
2 84.4  84.7
2 79.6  81.4
2 77.5  81.2
2 72.3  88.2
2   89  78.8
3 83.8  95.2
3 83.3  94.3
3   86  91.5
3 82.5  91.9
3 86.7 100.3
3 79.6  76.7
3 76.9  76.8
3 94.2 101.6
3 73.4  94.9
3 80.5  75.2
3 81.6  77.8
3 82.1  95.5
3 77.6  90.7
3 83.5  92.5
3 89.9  93.8
3   86  91.7
3 87.3    98
end
label values treatment treatment
label def treatment 1 "cognitive", modify
label def treatment 2 "control", modify
label def treatment 3 "family", modify



gen weight_change = after - before 
label var weight_change "weight change after treatment (ib)"

egen median = median(weight_change), by(treatment)

bysort median treatment (weight_change) : gen rank = _n 

gen axis = sum(3 *(treatment != treatment[_n-1])) + sum(rank != rank[_n-1])

forval g = 1/3 { 
    su axis if treatment == `g', meanonly 
    local pos`g' = r(mean)
    local line`g' = r(max) + 2
    local text`g' : label (treatment) `g' 
}

separate median, by(treatment) veryshortlabel 

scatter weight_change axis,  xla(`pos1' "`text1'" `pos2' "`text2'" `pos3' "`text3'", tlc(none)) xli(`line1' `line2', lp(solid)) xtitle("") ytitle("`: var label weight_change'") || line median? axis , lc(stc1 ..) legend(off) note(medians shown)

Click image for larger version

Name: anorexic.png
Views: 1
Size: 35.0 KB
ID: 1744186

The techniques here include:

Calculating medians for three groups. Using those to sort groups. Eventually plotting those medians as horizontal lines.

Constructing an axis variable based on treatments in median order and on sort order of outcome within treatments.

Working out where to put axis labels and separator lines. The aim is to get close to the style of by() without using by() at all. The motive for doing that is to respect unequal group sizes.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35432
#8

22 Feb 2024, 16:52

ib is a typo for lb (pounds).
Comment
Janine Stubbs

Join Date: May 2021

Posts: 34
#9

25 Feb 2024, 19:56

Nick,

I look forward to your paper on #4

Regards,

Janine
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 523
#10

26 Feb 2024, 05:09

Originally posted by Nick Cox View Post

Here is another take which came to me in a Doh! moment.

Wondering what a Doh! moment is, I found this Wikipedia entry. It seems that this is what you meant. Being attentive to (historical) details, you should perhaps have used the apostrophe as in "D'oh! moment".

All this (#2 ff) is very illuminating, thank you!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#11

26 Feb 2024, 05:51

Thanks for the thanks!

I am doubly exposed here: the Doh reference can be traced to my occasional encounters with the Simpsons series and also shows I am mangling the word in question. The intended flavour is just: I was stupid not to think of that earlier.
Comment

Sonnen Blume

Join Date: Aug 2018
Posts: 342

#12

26 Feb 2024, 09:27

Originally posted by Nick Cox View Post

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int treatment float(before after)
1 80.5 82.2
1 84.9 85.6
1 81.5 81.4
1 82.6 81.9
1 79.9 76.4
1 88.7 103.6
1 94.9 98.4
1 76.3 93.4
1 81 73.4
1 80.5 82.1
1 85 96.7
1 89.2 95.3
1 81.3 82.4
1 76.5 72.5
1 70 90.9
1 80.4 71.3
1 83.3 85.4
1 83 81.6
1 87.7 89.1
1 84.2 83.9
1 86.4 82.7
1 76.5 75.7
1 80.2 82.6
1 87.8 100.4
1 83.3 85.2
1 79.7 83.6
1 84.5 84.6
1 80.8 96.2
1 87.4 86.7
2 80.7 80.2
2 89.4 80.1
2 91.8 86.4
2 74 86.3
2 78.1 76.1
2 88.3 78.1
2 87.3 75.1
2 75.1 86.7
2 80.6 73.5
2 78.4 84.6
2 77.6 77.4
2 88.7 79.5
2 81.3 89.6
2 78.1 81.4
2 70.5 81.8
2 77.3 77.3
2 85.2 84.2
2 86 75.4
2 84.1 79.5
2 79.7 73
2 85.5 88.3
2 84.4 84.7
2 79.6 81.4
2 77.5 81.2
2 72.3 88.2
2 89 78.8
3 83.8 95.2
3 83.3 94.3
3 86 91.5
3 82.5 91.9
3 86.7 100.3
3 79.6 76.7
3 76.9 76.8
3 94.2 101.6
3 73.4 94.9
3 80.5 75.2
3 81.6 77.8
3 82.1 95.5
3 77.6 90.7
3 83.5 92.5
3 89.9 93.8
3 86 91.7
3 87.3 98
end
label values treatment treatment
label def treatment 1 "cognitive", modify
label def treatment 2 "control", modify
label def treatment 3 "family", modify



gen weight_change = after - before
label var weight_change "weight change after treatment (ib)"

egen median = median(weight_change), by(treatment)

bysort median treatment (weight_change) : gen rank = _n

gen axis = sum(3 *(treatment != treatment[_n-1])) + sum(rank != rank[_n-1])

forval g = 1/3 {
su axis if treatment == `g', meanonly
local pos`g' = r(mean)
local line`g' = r(max) + 2
local text`g' : label (treatment) `g'
}

separate median, by(treatment) veryshortlabel

scatter weight_change axis, xla(`pos1' "`text1'" `pos2' "`text2'" `pos3' "`text3'", tlc(none)) xli(`line1' `line2', lp(solid)) xtitle("") ytitle("`: var label weight_change'") || line median? axis , lc(stc1 ..) legend(off) note(medians shown)

[ATTACH=CONFIG]n1744186[/ATTACH]

The techniques here include:

Calculating medians for three groups. Using those to sort groups. Eventually plotting those medians as horizontal lines.

Constructing an axis variable based on treatments in median order and on sort order of outcome within treatments.

Working out where to put axis labels and separator lines. The aim is to get close to the style of by() without using by() at all. The motive for doing that is to respect unequal group sizes.

This is very interesting! Is it possible to change the color and shape of the bubbles in #7.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35432
#13

26 Feb 2024, 09:37

Naturally. Just use standard twoway scatter options as you wish.
Comment

Announcement