Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • framed_bar for framed bar charts downloadable from SSC

    Thanks as ever to Kit Baum, a new package framed_bar for framed bar charts is now downloadable from SSC. Stata 12 is needed.

    What is a framed bar chart? That term may be unfamiliar, but I will guess that the idea is something you've seen before or at least can absorb quickly.

    Let me back up and show the idea without using the new command. I will use twoway for a reason that will become clear shortly.

    Given a (0, 1) indicator variable, its mean is just the proportion of successes, in one jargon, or the proportion of observations in the state coded 1. So the mean of 0 0 0 1 1 1 1 1 1 1 is just 7/10 = 0.7, the proportion of successes and 1 minus that is the proportion of failures (in the same jargon), the state coded 0.

    Wanting to see percents instead is a common and easy variation.

    So, for the auto data, we can get the percents foreign as means and then fire up a bar chart.


    Click image for larger version

Name:	FB_S1.png
Views:	1
Size:	24.1 KB
ID:	1764477


    That kind of graph raises a simple thought. Zero is a natural reference and is naturally the base of each bar (including bars of zero height, which exist but are hard to see). The other natural reference is one or 100% which is the frame we're talking about. It's just another bar laid down first with here no fill colour. (If you prefer another colour to none, I recommend a very pale colour.) Now we have space and perhaps inclination to add annotation.

    Although frames as a graphic device are at least a century old, my understanding owes most to the work of William S. Cleveland. Encouragement to use frames
    in tabplot from the Stata Journal arose from comments from William Huber and Jeff Laux. The initial stimulus for this command was in a thread here by Jonathan Afilalo. Encouraging comments came from Tim Morris.

    What I guess many people would do here is a stacked bar chart, which is widely familiar, but (I suggest) not obviously better. This stacked
    version could be vertical if you prefer.

    Click image for larger version

Name:	FB_S3.png
Views:	1
Size:	23.2 KB
ID:	1764478


    Here is some code for those three graphs. NB catplot is from SSC, as just very recently updated.

    Code:
    sysuse auto, clear
    set scheme stcolor
    
    egen pc_foreign = mean(100 * foreign), by(rep78)
    egen tag = tag(rep78)
    twoway bar pc_foreign rep78 if tag, ytitle(% foreign) barw(0.8) name(G1, replace)
    
    gen frame = 100
    twoway bar frame rep78 if tag, pstyle(p2) barw(0.8) fcolor(none) || bar pc_foreign rep78 if tag, ytitle(% foreign) barw(0.8) pstyle(p1) legend(off) || scatter pc_foreign rep78 if tag, ms(none) mla(pc_foreign) mlabpos(12) mlabc(black) mlabformat(%1.0f) mlabsize(large) name(G2, replace)
    
    catplot , over(foreign) over(rep78) asyvars stack percent(rep78) l1title(Repair record 1978) name(G3, replace)
    Why did I just use twoway and not say graph bar (mean) or graph hbar (mean)? The reason is that I don't know a trick to get frames with either of the last two commands. (Someone may be able to fill in that gap.)

    Now if you like the idea and want to use it routinely, framed_bar offers to take over much of the nitty-gritty.

    Here's the sales pitch.

    framed_bar draws framed bar charts showing summary statistics for one or more numeric variables. Frames comparing bars with limiting values can work to
    simplify, by making an obvious complement tacit rather than explicit, and to clarify, by making near extremes evident.

    Recall that when you are drinking from a glass, your glass can be easily judged as (nearly) full, (nearly) empty, or in between. The glass analogy works
    better with vertical bars.

    It is best explained statistically by a leading application. Suppose we have a binary (Boolean, dichotomous, dummy, indicator, logical, one-hot, quantal,
    zero-one) variable conventionally coded 0 or 1. A common jargon is that the state coded 0 is dubbed failure and the state coded 1 is dubbed success;
    sometimes these terms are evocative and otherwise they are just terms of art.

    A common graphic for such data is a stacked bar chart showing the proportions of whatever is coded 0 and whatever is coded 1. However, missing values aside,
    those two proportions necessarily add to 1. Hence an alternative graphic is just a bar chart showing the proportion of successes within a frame of height or
    length 1. The empty space corresponds to the proportion of failures. Equivalently, such a bar chart plots the mean or means of one or more sets of binary
    values, as a mean of such a (0, 1) variable is just the proportion of successes.

    Variations on this design are offered by default or may be achieved by options. Headline possibilities include

    Any single summary statistic offered by collapse may be chosen rather than means.

    Annotation by default shows the statistics (say means) concerned as a numeric text display.

    Annotation may be shown optionally of sample sizes as a numeric text display.

    Values shown may on the fly be replaced according to a specified calculation.

    Frames may be suppressed. A frame may be irrelevant to your data or your purpose, but you may like the design otherwise.

    Here are two more examples, first showing the number of missing values in some variables (by subtraction from sample size). Here we switch the frame off.

    Code:
     
    . sysuse nlsw88, clear
    
    . framed_bar grade industry occupation union hours tenure, stat(count) calc(`=_N' - @) allobs sort frame(0) horizontal barlabel(mlabc(black) mlabs(medlarge)
            mlabf(%1.0f)) xsc(off) subtitle(Missing values)
    Click image for larger version

Name:	FB_S4.png
Views:	1
Size:	31.8 KB
ID:	1764479


    Now plot the proportion of college graduates by race and union worker:

    Code:
    . framed_bar collgrad, over(union) by(race, row(1)) barlabel(mlabf(%04.3f)) ytitle(Proportion of college graduates) yla(0 "0" 1 "1" 0.2(0.2)0.8,
            format(%02.1f))


    Click image for larger version

Name:	FB_S5.png
Views:	1
Size:	33.8 KB
ID:	1764480



    There are several references in the help file to use of frames in this sense, but more would be most welcome.

  • #2
    Respected Nick Cox, thank you for writing this helpful package. I am facing issues with the axis tick and space.

    1. How to reduce the space between bars?
    2. When I have less than five dummy variables, the ticks on the y-axis are not aligned with the y-label and remain fixed at 5, but I need only three.

    Code:
     framed_bar mobuse_3mnth3_d1 mobuse_3mnth3_d2 mobuse_3mnth3_d3 [aw=wt] if inrange(age,15,29), calc(100*@) by(residence, row(1)     ///
    title("Title", size(medium))) sort subset(residence==0)         ///
    frame(0) horizontal barlabel(mlabf(%04.1f)) recast(dropline) xla(none) xtitle("") ytitle("Access", size(small))            ///
    yla(1 "Exclusive"                                                                                         ///
    2 "Shared use"                                                                        ///
    3 "Not used"                                                                                ///
    , labsize(vsmall))  xtick(#6, tposition(inside)) ytick(#3 , tposition(outside))                ///
    text( -.2 25  "percent", size(small))

    Example data is here
    Code:
    clear
    input byte(mobuse_3mnth3_d1 mobuse_3mnth3_d2 mobuse_3mnth3_d3) float age byte residence float wt
    1 0 0 27 1  479.25
    0 1 0 25 1  479.25
    0 1 0 22 0  479.25
    1 0 0 27 1  479.25
    0 0 1 25 0  479.25
    1 0 0 24 1  479.25
    0 0 1 23 1  479.25
    0 0 1 15 0 1224.75
    0 1 0 19 1 1224.75
    0 1 0 17 1 1224.75
    1 0 0 24 0 1224.75
    1 0 0 21 1 1224.75
    1 0 0 27 1 1224.75
    1 0 0 24 0 1224.75
    0 1 0 28 1   511.2
    0 1 0 25 1   511.2
    0 1 0 28 1   511.2
    0 1 0 28 0   511.2
    0 1 0 25 1   511.2
    1 0 0 27 1   511.2
    1 0 0 25 1   511.2
    1 0 0 23 0   511.2
    1 0 0 20 1   511.2
    1 0 0 19 1   511.2
    0 0 1 15 1   511.2
    1 0 0 19 0   511.2
    0 1 0 17 1   511.2
    1 0 0 25 1   511.2
    1 0 0 18 0   511.2
    0 0 1 16 1   511.2
    0 1 0 28 1 3486.38
    0 0 1 22 1 3486.38
    0 0 1 20 1 3486.38
    0 1 0 25 0 3486.38
    0 1 0 21 0 3486.38
    0 1 0 17 0 3486.38
    0 1 0 24 0  258.25
    0 1 0 20 0 3947.54
    0 1 0 26 1 3947.54
    0 0 1 21 1 3947.54
    0 1 0 26 1 3947.54
    1 0 0 25 1 3947.54
    1 0 0 22 1 3947.54
    0 1 0 18 0 3947.54
    0 0 1 15 0 3947.54
    0 0 1 26 1  335.73
    1 0 0 27 1  335.73
    0 0 1 16 1  335.73
    1 0 0 27 1  335.73
    1 0 0 20 1  335.73
    0 1 0 15 1  335.73
    0 0 1 15 1  335.73
    0 1 0 27 1 2188.75
    0 1 0 24 1 2188.75
    0 0 1 25 1 2188.75
    0 0 1 23 1 2188.75
    0 1 0 20 1 2188.75
    0 1 0 18 0 2188.75
    0 0 1 15 1 2188.75
    0 0 1 15 1   257.5
    0 0 1 16 1   257.5
    0 1 0 25 1  600.83
    0 1 0 22 0  600.83
    0 1 0 26 0  600.83
    0 1 0 20 1  600.83
    0 1 0 23 0 1201.67
    0 1 0 24 1 1201.67
    0 1 0 25 1 1201.67
    0 1 0 22 1 1201.67
    0 0 1 17 0 3540.63
    0 1 0 28 1 3540.63
    0 0 1 23 1 3540.63
    0 1 0 28 1 3540.63
    0 0 1 25 1 3540.63
    0 1 0 20 1 3540.63
    0 0 1 21 1 3540.63
    0 1 0 27 0 3540.63
    0 0 1 20 1 3540.63
    0 0 1 24 1  701.25
    0 0 1 22 0  701.25
    0 1 0 28 1  233.75
    0 0 1 24 1  233.75
    0 0 1 26 1 1314.84
    0 1 0 19 0 1314.84
    0 1 0 28 1 1314.84
    0 0 1 19 1 1314.84
    0 1 0 21 1 1314.84
    0 0 1 19 0 1314.84
    0 0 1 26 1  727.22
    0 0 1 20 1  727.22
    0 0 1 21 1  727.22
    0 0 1 19 0  727.22
    0 0 1 18 1  727.22
    1 0 0 25 1  727.22
    0 0 1 20 1  727.22
    0 0 1 17 1  727.22
    1 0 0 22 0  727.22
    0 0 1 15 1  727.22
    0 1 0 27 1    1228
    0 1 0 26 1    1228
    end
    la def resid 0 "urban" 1 "rural"
    la val residence resid
    Click image for larger version

Name:	frame_bar.png
Views:	1
Size:	44.4 KB
ID:	1764589



    trying in Stata 18
    Last edited by Mukesh Punia; 27 Sep 2024, 07:31.
    Best regards,
    Mukesh

    (Stata 15.1 SE)

    Comment


    • #3
      Thanks for your interest!

      framed_bar defaults to showing bars, unsurprisingly. When it does show bars, the space between bars can be reduced by increasing the bar width. Otherwise, spacing is controlled indirectly by graph size and shape.

      Otherwise, I think most of your questions pivot on personal taste and judgement and pivot on use of graph twoway rather than framed_bar.

      I wouldn't use any ticks given your data and would learn towards something more like

      Code:
      la def resid 0 "Urban" 1 "Rural"
      la val residence resid
      
      framed_bar mobuse_3mnth3_d1 mobuse_3mnth3_d2 mobuse_3mnth3_d3 [aw=wt] if inrange(age,15,29), calc(100*@) by(residence, col(1))     ///
      sort subset(residence==0)         ///
      frame(0) horizontal barlabel(mlabf(%04.1f)) recast(dropline) xla(none) xtitle("percent") ytitle("Access")            ///
      yla(1 "Exclusive" 2 "Shared use" 3 "Not used")   subtitle(, pos(9) nobox nobexpand fcolor(none) size(large))
      Click image for larger version

Name:	mukesh.png
Views:	1
Size:	21.8 KB
ID:	1764597

      Comment


      • #4
        Thank you, Prof. Nick, for your response. Given the customisation issues, the best thing about framed_bar is that it allows for weight. In twoway weight are not allowed. I am trying to plot the weighted mean/proportion using survey data. Please let me know if there is any alternative that allows weight and other flexibility.

        Thank you!
        Best regards,
        Mukesh

        (Stata 15.1 SE)

        Comment


        • #5
          If I understand #4 correctly you want advice on graphics with pweights. I would start a new thread with that question.

          Comment


          • #6
            Dear Nick Cox , thank you very much for this very helpfukl program. I would like to ask you two questions:
            1. I am using something similar to your the FB11 example
            framed_bar collgrad, over(union) by(race, row(1) ) barlabel(mlabf(%04.3f)) ytitle(Proportion of college graduates) yla(0 "0" 1 "1" 0.2(0.2)0.8, format(%02.1f)) name(FB11, replace)
            . If I want to add a note to the by-options this note is not shown. Is there a way to circumvent this?
            framed_bar collgrad, over(union) by(race, row(1) note("Test note")) barlabel(mlabf(%04.3f)) ytitle(Proportion of college graduates) yla(0 "0" 1 "1" 0.2(0.2)0.8, format(%02.1f)) name(FB11_testnote, replace)
            .
            2. I really love the count/countlabel-option. In my case I want to show ten bars (over) by five groups (by) which makes the space for "n = xxx" very small and the "n =" is repeated quite often. Could it be an option to allow for a "n=" at the coordinate 0,0 and leave the other n= out? Or to specify the n= only at the first bar?
            Best regards, Marc

            Comment


            • #7
              #6 Thanks for this.

              Question 1. Zapping the default note was intended as a feature, but that isn't true for you. I will revisit this.

              Question 2. Alternative displays of sample size. Fair point, I will also think about that. There is an undocumented option wherecount() that controls the position of these labels.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                If I understand #4 correctly you want advice on graphics with pweights. I would start a new thread with that question.
                Yes, Like we use : tab x_var y_var [aw=weight_var] to calculate weighted percentage. Suppose I want to make a graph of weighted percentages the twoway graph doesn't allow that.
                Best regards,
                Mukesh

                (Stata 15.1 SE)

                Comment


                • #9
                  As a follow-up to #6 and #7: Thanks as always to Kit Baum, revised module files are now posted on SSC to allow the kinds of tweaks that Marc Kaulisch is asking for.

                  Comment

                  Working...
                  X