Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • BoxPlot with Mean: Pre and Post for Control and Experimental group

    Hello everyone!

    I hope there is an easy way to do this. I know that the are some documents explaining how to include a mean in a box plot but for some reason I found them complicated and hard to understand. I was wondering if there is a user command or a faster way to include just means in box-plot charts. I am comparing pre and post median for a control and experimental groups (see below). Also see my the code I used.
    Code:
    # delimit :
        graph box PreUOF PostUOF,
        title("UOF Incidents Distribution Before and After Counseling by Group", size(medium) span)
        ytitle(UOF Incidents)
        ylabel(0(2)16)
        yla(, ang(h))
        nooutsides
        legend(order(1 "Pre" 2 "Post"))
        over(Group)
        name(UOFPrePost_Group_Dis, replace);
    # delimit cr
    You can also find a sample data-set.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(FISA_NO PreUOF PostUOF) byte Group
     5672 5 2 1
    14023 3 2 0
     5438 4 5 0
    10993 3 0 0
    13202 3 1 0
    13268 4 7 1
    43385 3 2 1
    12161 4 4 1
    12227 5 6 0
    10151 4 4 1
    end
    label values Group Group
    label def Group 0 "Control", modify
    label def Group 1 "Experimental", modify
    Thank you in advance,
    Marvin
    Attached Files

  • #2
    Marvin: You're out of luck here in your chosen approach as graph box is not a twoway type and the methods for adding means are very limited, but do include added lines.

    But methods for adding means to box plots have been publicised for several years. You're presumably alluding to

    http://www.stata-journal.com/article...article=gr0039

    http://www.stata-journal.com/article...ticle=gr0039_1

    Actually, box plots seem poor methods here when the data are small counts, as tied values are likely, and medians and quartiles must be integers or halfway between them. There is an immediate indication of this in your example as 3 out of 4 boxes lack lower whiskers meaning that the lower 25% of values (or more!) are all the same count.

    Yet another possibility is stripplot from SSC. With your minimal example, I get this plot as one of several examples:

    Code:
    clear 
    set scheme s1color 
    input long(FISA_NO PreUOF PostUOF) byte Group
     5672 5 2 1
    14023 3 2 0
     5438 4 5 0
    10993 3 0 0
    13202 3 1 0
    13268 4 7 1
    43385 3 2 1
    12161 4 4 1
    12227 5 6 0
    10151 4 4 1
    end
    label values Group Group
    label def Group 0 "Control", modify
    label def Group 1 "Experimental", modify
    * Marvin stops 
    
    reshape long @UOF, i(FISA_NO) j(sWhen) string 
    label def when 0 "Pre" 1 "Post" 
    encode sWhen, label(when) gen(When) 
    egen mean = mean(UOF), by(Group When) 
    
    stripplot UOF, over(When) by(Group, legend(off) note("")) cumul box center vertical  /// 
    separate(When) ms(O O) mcolor(blue red) yla(, nogrid ang(h)) ///
    addplot(scatter mean When, ms(Dh) msize(*2)) xtitle("")
    Click image for larger version

Name:	marvin2.png
Views:	1
Size:	28.3 KB
ID:	1342457


    The graphs wouldn't look quite so strange with your fuller data set, but there are clear problems with the box plot idea. In the third subset, median and quartiles all coincide and the box is of zero length.

    Depending on your real dataset size, I suspect something as simple as a stem-and-leaf plot could be more effective.

    Comment


    • #3
      Hi Nick,

      Thank you so much for your reply. Just so you have a general background about my data-set. The DV are number of Use of force incidents that staff were involved in a a mental health facility. I am assessign the effectiveness of some counseling sessions. I conducted Wilcoxon signed rank test for the experimental and control group separately to see in the median decrease from pre to post (since my data are counts and therefore skewed). The distribution of my two groups look weird because I did not randomly select Staff for one of the groups. My experimental staff have more UOF in general.

      I wasn't aware of the stripplot command. Do the dots represents observations, something like a a scatter plot? So longer dots means more observations? Do you think the difference plot is useful? I look at a older post of yours and I really like the one in the link below. How did you make this one?

      http://www.statalist.org/forums/file...etch?id=210042

      I just nailed the interpretation and creation of boxplo and it seems that they are not the best visual for my dataset (as you mentioned). Do you think that just presenting the boxplot is misleading?

      thank you,
      Marvin Click image for larger version

Name:	StripPlot.png
Views:	1
Size:	55.0 KB
ID:	1342463

      Comment


      • #4
        Please give the link for the post, not the graph file.

        Comment


        • #5
          See below.
          http://www.statalist.org/forums/foru...updated-on-ssc

          Comment


          • #6
            Thanks.

            Something like this:

            Code:
            sysuse auto, clear 
            set scheme s1color 
            stripplot price, over(foreign) pctile(5) refline reflevel(median) ///
            box(barw(0.04)) boffset(-0.08)  cumul vertical yla(, ang(h)) ///
            ytitle(Price (USD)) xla(, noticks)
            Click image for larger version

Name:	marvin3.png
Views:	1
Size:	11.1 KB
ID:	1342478


            Comment


            • #7
              Hi Nick Cox ,

              I tried to re do my striplot graph using your latest code but got an error message.
              Code:
              . stripplot UOF, over(When) by(Group, legend(off) note("")) ///
              > pctile(5) refline reflevel(median) ///
              > box(barw(0.04)) boffset(-0.08)  cumul vertical yla(, ang(h)) ///
              > ytitle(Price (USD)) xla(, noticks)
              reference lines not available with by()
              Also, have you seen the result of the first graph I created using your code (#3)? Can you answer my questions at #3? I am confuse about the the stripplot for the difference? Since it is my first time generating these typo of charts it is a little bit difficult to understand them,

              Comment


              • #8
                That's a documented limitation. Please do read the help!

                refline or associated options may not be specified with by(). See the
                examples for a work-around.


                I quite like your graphs in #3. But they do underline that box plots aren't especially suitable for counted data.

                By the way, it's rather loose to interpret the Wilcoxon test as a test of medians. See e.g.
                http://stats.stackexchange.com/questions/214472/why-is-my-spearmans-rho-result-contradicting-my-wilcoxon-matched-pairs/214478#214478

                Comment


                • #9
                  This is the example in the help of a work-around.

                  Code:
                  sysuse auto, clear 
                  egen group = group(foreign rep78), label
                  replace group = cond(group <= 5, group, group + 1)
                  labmask group, values(rep78)
                  stripplot mpg, over(group) vertical cumul cumprob refline box centre ///
                  mcolor(blue) xmla(3 "Domestic" 8 "Foreign", tlength(*7) tlc(none) /// 
                  labsize(medium)) xtitle("") xli(6, lc(gs12) lw(vthin))
                  Here is my guess at what you need.

                  Code:
                  egen group = group(Group When), label
                  replace group = cond(group <= 2, group, group + 1)
                  
                  * need to install: -search labmask- for locations 
                  labmask group, values(When) decode 
                  
                  stripplot UOF, over(group) vertical cumul cumprob refline reflevel(median) box(barw(0.04)) boffset(-0.08) centre ///
                  xmla(1.5 "Control" 4.5 "Experimental", tlength(*7) tlc(none) labsize(medium)) xtitle("") xli(3, lc(gs12) lw(vthin))

                  Comment


                  • #10
                    Hi Nick,

                    Thank you so much for introducing me to the stripplot command and for potential issues of ussing Wilcoxon to interpret medians (another problem to worry about). I appreciated.
                    These charts may be a little to complex for a non technical audience. Instead I will try to created histograms for pre and post for each group (control and experimental). This can be complicated. I had a previous post regrading this type of histogram and you suggested to look at two way graph. http://www.statalist.org/forums/foru...ables-by-group
                    I found a partial code to do this but it not complete. I will follow this discussion in hte corresponding post.


                    Thank you,
                    Marvin

                    Comment


                    • #11
                      Dear Mr. Cox,

                      the resources you provided in #2 were very helpful for my own project. Thank your for sharing them!

                      I have one follow up question regarding the graph that was created in the pdf document under 2.2 and 2.3: How would one manipulate the code in order to have a fourth box plot in the graph which depicts a boxplot for the the whole sample (i.e. over all regions)?

                      I have tried to create the graph myself but I couldn't think of any way to adjust the group variable. Maybe there is a more obivous way which I was unable to find. I'd grateful for help.

                      Comment


                      • #12
                        #11 Steffen Mauch

                        See https://www.stata-journal.com/articl...article=gr0058 The main idea is to expand the dataset temporarily and to make the groups you want to show extra values of the existing grouping variable.

                        As in earlier posts in this thread (especially #2 and #6) I have drifted to thinking plain box plots rather over-sold, or at least over-used. If the number of groups or variables is less than about 7 there is usually scope to show much more detail without distraction.

                        I would tend to do something like this, although as already shown there are many other recipes. stripplot from SSC (referred to earlier in this thread) is here being used for quantile-box plots in which a traditional box showing median and quartiles is plotted on a quantile plot showing all values in order. If that is done there is no need to wrestle with arbitrary rules such as plotting points individually if and only if they lie more than 1.5 IQR from the nearer quartile. That rule is sometimes helpful but also quite often bizarre in its effects.

                        Code:
                        sysuse auto, clear 
                        
                        preserve 
                        
                        local Np1 = _N + 1 
                        expand 2
                        
                        * the value assigned to the expanded data must differ from that already used 
                        * here -foreign- has values 0 1 so 2 is appropriate 
                        replace foreign = 2 in `Np1'/L 
                        
                        label def origin 2 "Either" , add 
                        
                        stripplot mpg, over(foreign) box refline cumul cumprob centre vertical scheme(s1color) xla(, tlc(none)) xsc(titlegap(*5)) note(longer lines show means) yla(, ang(h))
                        
                        restore
                        Click image for larger version

Name:	mauch.png
Views:	1
Size:	40.8 KB
ID:	1662151

                        Comment


                        • #13
                          Nick Cox

                          Thank you very much. That works perfectly!

                          Comment

                          Working...
                          X