Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to assign different colors for subgroups in one bar plot?

    Dear statalist,
    I have a problem of plotting a bar chart in STATA. I have 3 categorical variables, and I'd like to count var1 over var2 by/over var3 (it's somehow like visualize a 2x2 table in bar graph). In details, my commands are like below:

    graph bar (count), over(var1, label(angle(45))) over(var2) blabel(bar) ytitle("xxx") by(var3, title("xxx")) legend(on)

    And the graph will like this:
    Click image for larger version

Name:	education degree distribution1.png
Views:	1
Size:	24.8 KB
ID:	1466336

    But I want to the color of each bar different to be different like this:
    Click image for larger version

Name:	education degree distribution.png
Views:	1
Size:	19.3 KB
ID:	1466337

    I made the latter one by changing the by(var3) into over(var3), But it turns out that the legend of each color (or bar?) will not able to display like the former one.
    And also, I'm wondering if there is a way to combine the values of two sex? Like to overlay them to make the difference of each bar between sex in each group more visible?
    Thank you so much.

  • #2
    You may wish to try the - asyvars - option.
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Marcos Almeida View Post
      You may wish to try the - asyvars - option.
      Dear Marcos,
      Thank you for the reply. I checked the guide and tried in STATA, I think maybe I can't make the latter chart if I keep using by(var3)... The -asyvars- option isn't help for this... And the label of my var1 still not displayed in the latter one condition...

      Comment


      • #4
        Please note FAQ Advice #18. https://www.statalist.org/forums/help#spelling

        I have a different suggestion. Wanting those arbitrary colours seems to follow from using a legend -- at best a necessary evil -- which in turn seems to stem from realising that text at 45 degrees is a poor choice.

        So, don't do that then. Horizontal text is more readable than either text on a slope or text in a legend.

        Here are the results of some experiments with tabplot (Stata Journal). I add a graph showing percent breakdown which seems likely to be as interesting as the raw frequencies.

        Code:
        * re-create dataset
        clear
        set obs 28
        egen degree = seq(), to(7)
        egen male = seq(), block(7) from(0) to(1)
        egen control = seq(), block(14) from(0) to(1)
        label define male 1 male 0 female
        label val male male
        label define control 0 case 1 control
        label val control control
        mat freq = (1, 22, 38, 13, 20, 19, 4, 11, 70, 120, 55, 23, 39, 19)
        mat freq = freq, (4, 17, 29, 16, 30, 32, 10, 8, 45, 68, 36, 39, 74, 50)
        gen freq = freq[1, _n]
        label define degree 1 none 2 elementary 3 high 4 technical 5 CEGEP 6 graduate 7 postgraduate
        label val degree degree
        
        * shared options
        local opts subtitle(, fcolor(green*0.1)) ytitle("") xtitle("")
        local opts `opts' separate(male) bar2(bfcolor(blue*0.4) blcolor(blue))
        local opts `opts' bar1(bfcolor(orange*0.4) blcolor(orange)) scheme(s1color) horizontal
        
        mac li
        
        tabplot degree male [fw=freq], by(control, note("frequencies", pos(12))) ///
        showval  name(G1, replace) `opts'
        
        tabplot degree male [fw=freq], by(control, note("percents", pos(12))) ///
        showval(format(%2.0f)) percent(male control) name(G2, replace) `opts'



        Click image for larger version

Name:	tabplot_yue1.png
Views:	1
Size:	22.3 KB
ID:	1466352



        Click image for larger version

Name:	tabplot_yue2.png
Views:	1
Size:	22.0 KB
ID:	1466353

        Comment


        • #5
          The frequencies copied above contain a small error. I don't think the main argument is affected.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            The frequencies copied above contain a small error. I don't think the main argument is affected.
            Thank you Nick! That looks very nice, and I will replace the graph with this one. (But honestly speaking, I do enjoy the different colors before, hahaha)

            Comment


            • #7
              I don't object to different colours. But an arbitrary mix of colours with no rationale doesn't really help readers.

              Adapting a common remark in visualization circles, you should want people to say Aha! I see a pattern in the data not Wow! How did you do that? let alone Huh? What is this mess supposed to tell me?

              Comment


              • #8
                Dear Marcos,
                Thank you for the reply. I checked the guide and tried in STATA, I think maybe I can't make the latter chart if I keep using by(var3)... The -asyvars- option isn't help for this... And the label of my var1 still not displayed in the latter one condition...

                I believe Nick's graph performs much better than the one you wish. That said, since you tried my suggestion in #2 and felt like it didn't work, here's a toy example, just to demonstrate the use of - asyvars - option:

                Code:
                sysuse auto
                gen casecontrol = 0
                replace casecontrol = 1 in 1/38
                label define casecontrol  0 "Case" 1 "Control"
                label values casecontrol casecontrol
                gen highrep = rep78 >3 & !missing(rep78)
                label define highrep 0 "lowrep" 1 "highrep"
                label values highrep highrep
                graph bar (count), over(foreign) over(highrep, label(angle(45))) blabel(bar) by(casecontrol, title("xxx")) asyvars

                Click image for larger version

Name:	Graph_2018.png
Views:	1
Size:	31.4 KB
ID:	1466628
                Best regards,

                Marcos

                Comment


                • #9
                  It's arguable that case and control should be side by side, not males and females. With any solution, that's just achieved by a permutation of the variables.

                  Comment


                  • #10
                    Adapting a common remark in visualization circles, you should want people to say Aha! I see a pattern in the data not Wow! How did you do that? let alone Huh? What is this mess supposed to tell me?
                    I think the mean of 'hahaha' is: although you are right, I still argue that using different colors is desirable/pleasing for me.

                    Comment


                    • #11
                      Is there any way to specify the colours? I'm trying to display the mean values for a numerical variable by country and want to get the graph to display with specific country color codes. The reason is that it's much more intuative to the viewer after seeing various country specific historgrams etc.. in the specific country color.

                      I've tried to specify the bar colours using ", bar(1 code)" options etc.. but nothing seems to work.

                      Comment


                      • #12
                        Eoin Grealis Yes, you can specify different colours, but usually that depends on one of

                        1. Offering different variables to a command, which will then be assigned different colours.

                        2. An option such as asyvars or separate() which insists on the same outcome.

                        In your case, I think a data example and your exact code tried may be needed to allow better advice.

                        Comment

                        Working...
                        X