Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drawing a bar graph for two years with categories

    Hello Stata users;

    I have this bit of data which comes from a survey:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 answer byte(Year_2023 Year_2024)
    "Negative"     24 45
    "Stabe"        37 31
    "Positive"     36 19
    "I don't know"  3  5
    end
    The first variable "answer" are the categories of answers that people where given the posibility to choose from, the second variable "Year 2023" are the answer, by percentage, of people for each category, and the same thing for the third and last variable, which is for the year 2024.
    The goal is to draw a bar graph with the two years in it on the X axis, and the percentages of the categories of answer are on the Y axis.

    Any help please? With many thanks...

  • #2
    There are many possible takes on this, but I'd say the main essential is treating negative to positive as an ordered sequence with "stabe" (presumably "stable") in the middle. That is to say, alphebetical order doesn't make sense.

    It also seems that you will be better off after a reshape.

    Here are three takes, which don't exhaust the possibilities. The rather predictable stacked plot is not my favourite.

    labmask and tabplot are from the Stata Journal. Other implicit choices are Stata 18 defaults.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 answer byte(Year_2023 Year_2024)
    "Negative"     24 45
    "Stabe"        37 31
    "Positive"     36 19
    "I don't know"  3  5
    end
    
    replace answer = "Stable" if answer == "Stabe"
    
    gen which = _n 
    
    labmask which, values(answer)
    
    reshape long Year_, i(which) j(year)
    
    rename Year_ percent 
    
    graph bar (count) [fw=percent], over(which, descending) over(year) asyvars stack ///
    bar(1, color(red%50)) bar(2, color(gs8%50)) bar(3, color(blue%50)) ///
    bar(4,color(magenta%50)) ytitle("percent") aspect(1) blabel(bar, pos(center) size(medlarge)) name(G1, replace)
    
    tabplot which year [iw=percent], showval ytitle("") xtitle("") ///
    separate(which) bar1(color(red)) bar2(color(gs8)) bar3(color(blue)) ///
    bar4(color(magenta)) aspect(1) name(G2, replace)
    
    separate percent, by(which) shortlabel
    
    tokenize red gs8 blue magenta 
    
    forval j = 1/4 { 
        local call `call' || scatter percent year if year == 2023 & which == `j', ms(none) mlab(percent) mlabsize(medlarge) mlabc(``j'') mlabpos(9)
        local call `call' || scatter percent year if year == 2024 & which == `j', ms(none) mlab(percent) mlabsize(medlarge) mlabc(``j'')
    }
    
    line percent? year, by(which, legend(off) note("") row(1)) ///
    lc(red gs8 blue magenta) lw(medthick ..) yla(0(10)50) xla(2023 2024) xsc(r(2022.8 2024.2)) ///
    `call' ytitle(percent) xtitle("") name(G3, replace)

    Click image for larger version

Name:	aziz_G1.png
Views:	1
Size:	39.7 KB
ID:	1775173
    Click image for larger version

Name:	aziz_G2.png
Views:	1
Size:	31.0 KB
ID:	1775174
    Click image for larger version

Name:	aziz_G3.png
Views:	1
Size:	51.1 KB
ID:	1775175

    Comment


    • #3
      Code:
      * Here is some simpler code for third graph 
      
      separate percent, by(which) shortlabel
      
      gen where = cond(year == 2023, 9, 3)
      
      line percent? year, by(which, legend(off) note("") row(1)) ///
      lc(red gs8 blue magenta) lw(medthick ..) yla(0(10)50) xla(2023 2024) xsc(r(2022.8 2024.2)) ///
      `call' ytitle(percent) xtitle("") name(G4, replace) /// 
      || scatter percent? year, ms(none ..) mlab(percent1 percent2 percent3 percent4) mlabsize(medlarge ..) ///
      mlabc(red gs8 blue magenta) mlabvpos(where)

      Comment


      • #4
        Nick Cox Thanks you very much for the help and the rich suggestions. The graph I was looking for is the first one, I guess it's easy to read and clear... Yet I was impressed by the third graph and the idea and logic behind it, I guess you've considered the data as a time series, right? That's a good idea, yet, since I'm only working on two years, I don't have that long of a time serie, I guess the first graph is more suitable.
        Yet, again, thanks for the rich suggestions.

        Comment


        • #5
          Thanks for your closure.

          I think the first graph is the least suitable. The fact that it is to many people a highly familiar design is something different. The daily newspaper I get has the lowest circulation in Britain, but it's better than all the others in my view. Popularity is no gauge of merit.

          Similarly, the fact that you have only two years of data doesn't make the last graph unsuitable. It is designed for that situation, which makes it possible! Some people have taken a liking to Tufte's nice simple name, slope charts, although the meme that he invented them is ill-founded.

          Comment


          • #6
            The critique implied in #5 of stacked bar charts is standard, but for one example of a place where it is spelled out see e.g. https://www.statalist.org/forums/for...updated-on-ssc

            Comment


            • #7
              Thank you Nick Cox. I just arrange commands that tackle bar charts in these days, and doubt differences between catplot, tabplot, and tableplot..., and guess the motive behind your programming. I will learn your post https://www.statalist.org/forums/for...updated-on-ssc

              Comment


              • #8
                Chen Samulsion I can simplify your life a little.

                tableplot
                on SSC had a distinct rationale once but no longer. It's not a command to trouble with unless you find it in someone else's code. But thanks for the mention, as I now feel that I should flag it as essentially obsolete.

                Otherwise the big difference is driven by the structure of Stata graph support. catplot despite rewrites always was and remains a wrapper for graph bar, graph hbar or graph dot. tabplot is a wrapper for twoway commands.

                Comment


                • #9
                  Here are six (indeed possibly seven) reasons why I think stacked bar charts are oversold.

                  Starting positively, stacking bars makes sense whenever components add to a total, which can include components being sometimes negative.

                  That said, although stacking is easy to understand, how effective is it graphically at conveying the structure of data?

                  1. Stacking often implies a legend, which is usually a burden as well as a blessing. My guess is that few people can memorize a legend quickly unless it's for just two or three items. So understanding the data requires back and forth between legend and graph. Which component is this? Colour choice is rarely trivial beyond two or three possibilities.

                  2. In the usual case in which stacking starts at zero, it is quite easy to follow changes in the first component with a constant baseline. Other components are harder to follow.

                  3. Zero or small values imply bars of zero or small height or length. That's logical but may be important, yet is hard to spot.

                  4. Annotation with numeric values can be awkward, especially if #3 is also biting.

                  5. Whenever as often components add to 100% or some equivalent, the fact that each total bar is the same height is reassuring but not otherwise informative.

                  6. Negative values can be shown conventionally, say as subtractions downwards from zero, but it is then awkward (a) to show totals as well (b) to show components that are sometimes positive, sometimes negative at all clearly.

                  7. Other designs, especially multiple line plots, dot charts or unstacked bar charts, often work as well or better. If these disappoint as just showing a mass of detail, is the stacked chart any better, and why are you presenting the data at all if the pattern is too complicated to absorb?

                  Perhaps if the point is merely permissive -- to let readers look up details they might care about -- a table might be better.

                  Comment


                  • #10
                    However, in some cases, tableplot can be taken as a short-cut to tabplot.
                    Code:
                    sysuse auto, clear
                    egen meanprice = mean(price), by(foreign rep78)
                    tableplot rbar meanprice for rep78, showval(format(%4.0f)) name(g1, replace)
                    egen tag = tag(foreign rep78)
                    tabplot foreign rep78 if tag [iw=meanprice], showval(format(%4.0f)) noborder name(g2, replace)
                    graph combine g1 g2
                    Click image for larger version

Name:	Graph.png
Views:	1
Size:	165.3 KB
ID:	1775777

                    Comment


                    • #11
                      The logic is sound but still repetitive:

                      Code:
                      sysuse auto, clear
                      egen meanprice = mean(price), by(foreign rep78)
                      egen tag = tag(foreign rep78)
                      tabplot foreign rep78 if tag [iw=meanprice], showval(format(%4.0f))

                      Comment


                      • #12
                        Yes, I mean that tableplot saves one step compared with tabplot.

                        Comment


                        • #13
                          Agreed. That's not enough for me to justify maintaining tableplot as a separate command. The code hasn't been touched since 2007.

                          Comment


                          • #14
                            Nick Cox Thanks for the clarifications. Indeed you could have a point in what you've said on #5. The thing is that a bar graph is more familiar, you're kind of right about that, since I'm used to line graphs for long time series (and that's what I thought the third graph is not that much suitable). Also, maybe it is because the third graph gives the impression that each category was drawn in it's own sub-graph, which could perhaps make the interpretation of the evolution of the sums of the 4 categories a bit hard at first. Anyway, I was impressed by that third graph because I didn't think of it an an alternative already.

                            Comment

                            Working...
                            X