BoxPlot with Mean: Pre and Post for Control and Experimental group

Marvin Aliaga

Join Date: Feb 2015
Posts: 255

BoxPlot with Mean: Pre and Post for Control and Experimental group

24 May 2016, 12:31

Hello everyone!

I hope there is an easy way to do this. I know that the are some documents explaining how to include a mean in a box plot but for some reason I found them complicated and hard to understand. I was wondering if there is a user command or a faster way to include just means in box-plot charts. I am comparing pre and post median for a control and experimental groups (see below). Also see my the code I used.

Code:

# delimit :
    graph box PreUOF PostUOF,
    title("UOF Incidents Distribution Before and After Counseling by Group", size(medium) span)
    ytitle(UOF Incidents)
    ylabel(0(2)16)
    yla(, ang(h))
    nooutsides
    legend(order(1 "Pre" 2 "Post"))
    over(Group)
    name(UOFPrePost_Group_Dis, replace);
# delimit cr

You can also find a sample data-set.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(FISA_NO PreUOF PostUOF) byte Group
 5672 5 2 1
14023 3 2 0
 5438 4 5 0
10993 3 0 0
13202 3 1 0
13268 4 7 1
43385 3 2 1
12161 4 4 1
12227 5 6 0
10151 4 4 1
end
label values Group Group
label def Group 0 "Control", modify
label def Group 1 "Experimental", modify

Thank you in advance,
Marvin

Attached Files

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35436
#2

24 May 2016, 13:38

Marvin: You're out of luck here in your chosen approach as graph box is not a twoway type and the methods for adding means are very limited, but do include added lines.

But methods for adding means to box plots have been publicised for several years. You're presumably alluding to

http://www.stata-journal.com/article...article=gr0039

http://www.stata-journal.com/article...ticle=gr0039_1

Actually, box plots seem poor methods here when the data are small counts, as tied values are likely, and medians and quartiles must be integers or halfway between them. There is an immediate indication of this in your example as 3 out of 4 boxes lack lower whiskers meaning that the lower 25% of values (or more!) are all the same count.

Yet another possibility is stripplot from SSC. With your minimal example, I get this plot as one of several examples:

Code:

clear set scheme s1color input long(FISA_NO PreUOF PostUOF) byte Group 5672 5 2 1 14023 3 2 0 5438 4 5 0 10993 3 0 0 13202 3 1 0 13268 4 7 1 43385 3 2 1 12161 4 4 1 12227 5 6 0 10151 4 4 1 end label values Group Group label def Group 0 "Control", modify label def Group 1 "Experimental", modify * Marvin stops reshape long @UOF, i(FISA_NO) j(sWhen) string label def when 0 "Pre" 1 "Post" encode sWhen, label(when) gen(When) egen mean = mean(UOF), by(Group When) stripplot UOF, over(When) by(Group, legend(off) note("")) cumul box center vertical /// separate(When) ms(O O) mcolor(blue red) yla(, nogrid ang(h)) /// addplot(scatter mean When, ms(Dh) msize(*2)) xtitle("")

The graphs wouldn't look quite so strange with your fuller data set, but there are clear problems with the box plot idea. In the third subset, median and quartiles all coincide and the box is of zero length.

Depending on your real dataset size, I suspect something as simple as a stem-and-leaf plot could be more effective.
Comment
Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#3

24 May 2016, 14:20

Hi Nick,

Thank you so much for your reply. Just so you have a general background about my data-set. The DV are number of Use of force incidents that staff were involved in a a mental health facility. I am assessign the effectiveness of some counseling sessions. I conducted Wilcoxon signed rank test for the experimental and control group separately to see in the median decrease from pre to post (since my data are counts and therefore skewed). The distribution of my two groups look weird because I did not randomly select Staff for one of the groups. My experimental staff have more UOF in general.

I wasn't aware of the stripplot command. Do the dots represents observations, something like a a scatter plot? So longer dots means more observations? Do you think the difference plot is useful? I look at a older post of yours and I really like the one in the link below. How did you make this one?

http://www.statalist.org/forums/file...etch?id=210042

I just nailed the interpretation and creation of boxplo and it seems that they are not the best visual for my dataset (as you mentioned). Do you think that just presenting the boxplot is misleading?

thank you,
Marvin
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#4

24 May 2016, 14:32

Please give the link for the post, not the graph file.
Comment
Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#5

24 May 2016, 14:36

See below.
http://www.statalist.org/forums/foru...updated-on-ssc
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35436

24 May 2016, 14:48

Thanks.

Something like this:

Code:

sysuse auto, clear 
set scheme s1color 
stripplot price, over(foreign) pctile(5) refline reflevel(median) ///
box(barw(0.04)) boffset(-0.08)  cumul vertical yla(, ang(h)) ///
ytitle(Price (USD)) xla(, noticks)

Click image for larger version

Name: marvin3.png
Views: 1
Size: 11.1 KB
ID: 1342478

Comment

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#7

25 May 2016, 07:59

Hi Nick Cox ,

I tried to re do my striplot graph using your latest code but got an error message.

Code:

. stripplot UOF, over(When) by(Group, legend(off) note("")) /// > pctile(5) refline reflevel(median) /// > box(barw(0.04)) boffset(-0.08) cumul vertical yla(, ang(h)) /// > ytitle(Price (USD)) xla(, noticks) reference lines not available with by()

Also, have you seen the result of the first graph I created using your code (#3)? Can you answer my questions at #3? I am confuse about the the stripplot for the difference? Since it is my first time generating these typo of charts it is a little bit difficult to understand them,
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#8

25 May 2016, 08:20

That's a documented limitation. Please do read the help!

refline or associated options may not be specified with by(). See the
examples for a work-around.

I quite like your graphs in #3. But they do underline that box plots aren't especially suitable for counted data.

By the way, it's rather loose to interpret the Wilcoxon test as a test of medians. See e.g. http://stats.stackexchange.com/questions/214472/why-is-my-spearmans-rho-result-contradicting-my-wilcoxon-matched-pairs/214478#214478
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35436

25 May 2016, 08:42

This is the example in the help of a work-around.

Code:

sysuse auto, clear 
egen group = group(foreign rep78), label
replace group = cond(group <= 5, group, group + 1)
labmask group, values(rep78)
stripplot mpg, over(group) vertical cumul cumprob refline box centre ///
mcolor(blue) xmla(3 "Domestic" 8 "Foreign", tlength(*7) tlc(none) /// 
labsize(medium)) xtitle("") xli(6, lc(gs12) lw(vthin))

Here is my guess at what you need.

Code:

egen group = group(Group When), label
replace group = cond(group <= 2, group, group + 1)

* need to install: -search labmask- for locations 
labmask group, values(When) decode 

stripplot UOF, over(group) vertical cumul cumprob refline reflevel(median) box(barw(0.04)) boffset(-0.08) centre ///
xmla(1.5 "Control" 4.5 "Experimental", tlength(*7) tlc(none) labsize(medium)) xtitle("") xli(3, lc(gs12) lw(vthin))

Comment

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#10

25 May 2016, 09:07

Hi Nick,

Thank you so much for introducing me to the stripplot command and for potential issues of ussing Wilcoxon to interpret medians (another problem to worry about). I appreciated.
These charts may be a little to complex for a non technical audience. Instead I will try to created histograms for pre and post for each group (control and experimental). This can be complicated. I had a previous post regrading this type of histogram and you suggested to look at two way graph. http://www.statalist.org/forums/foru...ables-by-group
I found a partial code to do this but it not complete. I will follow this discussion in hte corresponding post.

Thank you,
Marvin
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#11

29 Apr 2022, 02:59

Dear Mr. Cox,

the resources you provided in #2 were very helpful for my own project. Thank your for sharing them!

I have one follow up question regarding the graph that was created in the pdf document under 2.2 and 2.3: How would one manipulate the code in order to have a fourth box plot in the graph which depicts a boxplot for the the whole sample (i.e. over all regions)?

I have tried to create the graph myself but I couldn't think of any way to adjust the group variable. Maybe there is a more obivous way which I was unable to find. I'd grateful for help.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#12

29 Apr 2022, 04:09

#11 Steffen Mauch

See https://www.stata-journal.com/articl...article=gr0058 The main idea is to expand the dataset temporarily and to make the groups you want to show extra values of the existing grouping variable.

As in earlier posts in this thread (especially #2 and #6) I have drifted to thinking plain box plots rather over-sold, or at least over-used. If the number of groups or variables is less than about 7 there is usually scope to show much more detail without distraction.

I would tend to do something like this, although as already shown there are many other recipes. stripplot from SSC (referred to earlier in this thread) is here being used for quantile-box plots in which a traditional box showing median and quartiles is plotted on a quantile plot showing all values in order. If that is done there is no need to wrestle with arbitrary rules such as plotting points individually if and only if they lie more than 1.5 IQR from the nearer quartile. That rule is sometimes helpful but also quite often bizarre in its effects.

Code:

sysuse auto, clear preserve local Np1 = _N + 1 expand 2 * the value assigned to the expanded data must differ from that already used * here -foreign- has values 0 1 so 2 is appropriate replace foreign = 2 in `Np1'/L label def origin 2 "Either" , add stripplot mpg, over(foreign) box refline cumul cumprob centre vertical scheme(s1color) xla(, tlc(none)) xsc(titlegap(*5)) note(longer lines show means) yla(, ang(h)) restore
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#13

30 Apr 2022, 14:08

Nick Cox

Thank you very much. That works perfectly!
Comment

Announcement

BoxPlot with Mean: Pre and Post for Control and Experimental group

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment