Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stacked bar chart with legend for categorical variable

    Hello,

    I am trying to create a graph which I am sure is quite simple but have not been able to find in my search of previous forum posts. I have a categorical variable (vistype) that has three levels (0=In-person, 1=Telemedicine, 2=Telephone) and I want to create a bar chart for how the frequencies change monthly over 13 months (September 2019 to September 2020). I have a variable "mydate" that is a month and year. Using "graph bar (count) vistype, over(mydate) stack" almost gets me what I want except I can't see the frequencies of the different values of vistype. Dataex example below.

    In summary, I want a stacked bar with frequencies on the y axis, months on the x axis (Sept 2019, Oct 2019 etc.) where different colors correspond to the values of the categorical variable vistype. Any help with labeling also much appreciated!

    Sarah



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(vistype mydate)
    0 721
    0 717
    0 722
    0 725
    0 728
    0 717
    0 722
    0 717
    0 719
    0 719
    1 725
    0 722
    0 718
    0 721
    0 718
    0 720
    0 727
    0 720
    0 726
    0 717
    0 726
    0 727
    0 720
    1 727
    1 724
    1 719
    0 721
    0 724
    0 727
    0 717
    1 727
    1 728
    0 724
    1 724
    0 719
    2 727
    0 718
    0 721
    1 724
    0 718
    1 724
    0 717
    0 722
    1 726
    0 721
    0 716
    1 725
    1 728
    0 721
    0 717
    1 723
    1 725
    1 728
    0 719
    0 726
    1 723
    0 720
    0 727
    2 723
    2 723
    0 718
    1 724
    1 724
    0 721
    1 723
    1 723
    0 727
    1 725
    0 717
    0 722
    0 725
    0 722
    0 719
    0 725
    0 718
    0 724
    0 717
    0 720
    1 724
    1 723
    0 721
    0 728
    0 717
    1 723
    1 725
    1 727
    1 728
    0 718
    0 726
    0 716
    1 723
    0 721
    1 725
    1 723
    0 718
    0 726
    2 722
    0 719
    1 728
    1 724
    end
    format %tm mydate
    label values vistype vistype
    label def vistype 0 "In-person", modify
    label def vistype 1 "Telemedicine", modify
    label def vistype 2 "Phone", modify

  • #2
    I think you need a categorical variable with value labels as opposed to a formatted date variable. Below, I use labmask from the Stata Journal, authored by Nick Cox.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(vistype mydate)
    0 721
    0 717
    0 722
    0 725
    0 728
    0 717
    0 722
    0 717
    0 719
    0 719
    1 725
    0 722
    0 718
    0 721
    0 718
    0 720
    0 727
    0 720
    0 726
    0 717
    0 726
    0 727
    0 720
    1 727
    1 724
    1 719
    0 721
    0 724
    0 727
    0 717
    1 727
    1 728
    0 724
    1 724
    0 719
    2 727
    0 718
    0 721
    1 724
    0 718
    1 724
    0 717
    0 722
    1 726
    0 721
    0 716
    1 725
    1 728
    0 721
    0 717
    1 723
    1 725
    1 728
    0 719
    0 726
    1 723
    0 720
    0 727
    2 723
    2 723
    0 718
    1 724
    1 724
    0 721
    1 723
    1 723
    0 727
    1 725
    0 717
    0 722
    0 725
    0 722
    0 719
    0 725
    0 718
    0 724
    0 717
    0 720
    1 724
    1 723
    0 721
    0 728
    0 717
    1 723
    1 725
    1 727
    1 728
    0 718
    0 726
    0 716
    1 723
    0 721
    1 725
    1 723
    0 718
    0 726
    2 722
    0 719
    1 728
    1 724
    end
    format %tm mydate
    label values vistype vistype
    label def vistype 0 "In-person", modify
    label def vistype 1 "Telemedicine", modify
    label def vistype 2 "Phone", modify
    
    gen lab= string(mydate, "%tmmcY")
    labmask mydate, values(lab)
    set scheme s1color
    graph hbar (count), over(vistype) over(mydate) asyvars stack ///
    blab(bar, pos(inside)) bar(1, color(red%25)) bar(2, color(blue%25)) ///
    bar(3, color(orange%25)) leg(rows(1))
    Res.:

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	19.4 KB
ID:	1602173


    Last edited by Andrew Musau; 07 Apr 2021, 14:25.

    Comment


    • #3
      There are many threads here on bar charts, perhaps hundreds. The difficulty is indeed finding precisely what you want.

      graph bar is often disappointing for time series data, if only because (not only because) the date labels are often a mess (and, forgive me, you might not want them all when you see what a mess that implies, or awkward giraffe graphics with labels turned vertically or on a slant). This seemed to come up here so often that I wrote up some notes which are accessible at https://journals.sagepub.com/doi/pdf...6867X211000032

      Stacked designs are often disappointing any way because although in principle they show all the information, they are often hard to decode -- and if you want to show the frequencies where do you put them, especially zeros and very small values? A real nuisance of this design in my view is a frequent need for a legend or key.

      The code here shows something perhaps similar to what you asked for and something I think personally is better.

      tabplot code is downloadable from the Stata Journal site and an overview in detail is available at https://www.stata-journal.com/articl...article=gr0066 and a little more concisely at https://www.statalist.org/forums/for...updated-on-ssc

      You may have any colours you like, naturally, but one choice is automated by
      mycolours on which more is said at https://www.statalist.org/forums/for...ailable-on-ssc

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(vistype mydate)
      0 721
      0 717
      0 722
      0 725
      0 728
      0 717
      0 722
      0 717
      0 719
      0 719
      1 725
      0 722
      0 718
      0 721
      0 718
      0 720
      0 727
      0 720
      0 726
      0 717
      0 726
      0 727
      0 720
      1 727
      1 724
      1 719
      0 721
      0 724
      0 727
      0 717
      1 727
      1 728
      0 724
      1 724
      0 719
      2 727
      0 718
      0 721
      1 724
      0 718
      1 724
      0 717
      0 722
      1 726
      0 721
      0 716
      1 725
      1 728
      0 721
      0 717
      1 723
      1 725
      1 728
      0 719
      0 726
      1 723
      0 720
      0 727
      2 723
      2 723
      0 718
      1 724
      1 724
      0 721
      1 723
      1 723
      0 727
      1 725
      0 717
      0 722
      0 725
      0 722
      0 719
      0 725
      0 718
      0 724
      0 717
      0 720
      1 724
      1 723
      0 721
      0 728
      0 717
      1 723
      1 725
      1 727
      1 728
      0 718
      0 726
      0 716
      1 723
      0 721
      1 725
      1 723
      0 718
      0 726
      2 722
      0 719
      1 728
      1 724
      end
      format %tm mydate
      label values vistype vistype
      label def vistype 0 "In-person", modify
      label def vistype 1 "Telemedicine", modify
      label def vistype 2 "Phone", modify
      
      * ssc install mycolours
      mycolours
      
      set scheme s1color
      
      * install from Stata Journal
      tabplot vistype mydate, showval scheme(s1color) yreverse xasis xla(, format(%tmMon_YY)) xla(716(4)728, grid) xtitle("") ytitle("") ///
      separate(vistype) bar1(color(`"`OK1'"')) bar2(color(`"`OK2'"')) bar3(color(`"`OK3'"')) name(G2, replace)
      
      bysort mydate: egen A = total(vistype == 0)
      by mydate: egen B = total(vistype <= 1)
      by mydate: egen C = total(vistype <= 2)
      egen tag = tag(mydate)
      
      twoway bar A mydate if tag, color("`OK1'") || rbar A B mydate if tag, col("`OK2'") || rbar B C mydate if tag, color("`OK3'") legend(order(3  "Phone" 2 "Telemedicine" 1 "In-person") pos(3) col(1)) xla(, format(%tmMon_YY)) xla(716(4)728, grid) xtitle("") ytitle("Frequency")  yla(, ang(h)) name(G1, replace)


      Many details can be tweaked. The default bar width for twoway bar is 1, but the bars don't have to touch. Conversely, the default bar width for tabplot is 0.5 and devised with categorical data in mind, but it too is changeable at whim.

      Click image for larger version

Name:	phones_G1.png
Views:	1
Size:	22.7 KB
ID:	1602170

      Click image for larger version

Name:	phones_G2.png
Views:	1
Size:	23.3 KB
ID:	1602171

      Comment


      • #4
        Thanks so much! Nick, that second graph is awesome! You're right- it's much easier to visualize. I'll be happy to have this code in my repertoire.

        Comment


        • #5
          Pleased to help. Presumably this is all about changing modes of getting medical help and advice during the pandemic. You can naturally recast it in terms of proportions or percents if that is preferred.

          Comment


          • #6
            If I wanted to do proportions/percents, how would I do that?

            Comment


            • #7
              It is an option in tabplot. As I understand it the natural breakdown here is by date. With the extra options

              Code:
              percent(mydate) frame(100) subtitle(%)


              you get this. The default with showval and percent options of 1 decimal place can seem fussy and can be overridden. (I would be surprised if anyone regards 14.3 as meaning anything different from 14 although the difference between 0.1 and 0.4 could be a medium deal.

              The frames are in a strong sense adding no new information but I quite like them. They weren't in the version written up in 2016.

              Click image for larger version

Name:	phones_G3.png
Views:	1
Size:	25.8 KB
ID:	1602201


              For stacked bars all you need do is scale A, B. C by C and then the bars add to 100%.

              Comment


              • #8
                Hi, I'd like to create graph with month/year (x-axis) and frequency (y-axis) using tabplot from SSC in STAT16.1.
                Here are the dataex from my data, codes I ran, and the images I got. What I wanted was a beautiful x-axis, which I mean Dec19, Jan20, Feb20, Mar20, and so on, but what I got was only part of the months. In addition, I wanted the graphs whose color is eye-friendly like Nick did in #3, not as vivid as green and orange in my images. Lastly, I wanted a bar graph (about the second image) with bars slightly spaced apart each other, not directly adjacent. I would appreciate any advice!
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input str1 NON_HRT_DON int TX_MONTH
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                "N" 708
                end
                format %tm TX_MONTH
                Code:
                tabplot NON_HRT_DON TX_MONTH, percent(TX_MONTH) frame(100) subtitle(%) scheme(s1color) yreverse xasis xla(, format(%tmMon_YY)) xla(716(4)728, grid) xtitle("") ytitle("") ///
                separate(DCD) bar1(color(`"`OK1'"')) bar2(color(`"`OK2'"')) name(G2, replace)
                
                bysort TX_MONTH: egen A = total(NON_HRT_DON == 0)
                by TX_MONTH: egen B = total(NON_HRT_DON <= 1)
                
                egen tag = tag(TX_MONTH)
                
                twoway bar A TX_MONTH if tag, color("`OK1'") || rbar A B TX_MONTH if tag, col("`OK2'") legend(order(2  "NO" 1 "YES") pos(3) col(1)) xla(, format(%tmMon_YY)) xla(716(4)728, grid) xtitle("") ytitle("Frequency")  yla(, ang(h)) name(G1, replace)
                Click image for larger version

Name:	G1.png
Views:	1
Size:	70.3 KB
ID:	1746482

                Attached Files

                Comment

                Working...
                X