Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered bar graph with multiple categorical variables

    Hello,

    I am hoping to create a clustered bar graph to compare satisfaction with information received from various sources, such that the categories of satisfaction (i.e., not at all, a little, somewhat, very much) are on the x-axis with a separate bar for each source of information (i.e., A, B, C, D). Despite trying multiple different commands, namely 'bar graph, over()' and 'as category', I have not been able to achieve this. Guidance would be very much appreciated!!

    In the data below, 0 = Not applicable, 1 = Not at all, 2 = A little, 3 = Somewhat, 4 = A lot
    ID A B C D
    1 1 1 2 4
    2 1 2 1 4
    3 1 3 1 3
    4 1 0 0 0
    5 1 0 0 0
    6 1 1 1 1
    7 0 0 1 1
    8 1 2 1 2
    9 1 1 1 1
    10 1 1 2 .
    11 1 1 1 3
    12 1 1 1 2
    13 1 1 1 1
    14 0 0 2 2
    15 1 1 1 1
    16 1 1 1 4
    17 1 2 1 2
    18 1 1 1 1
    19 3 2 2 3
    20 1 1 1 4

  • #2
    I don't get a clear sense from this what you want to be clusters, or what you want to be bars, and your syntax does not help much, as bar graph isn't even legal and you don't mention any variable names.

    Either way, I see here a two-way table, so you need a way of showing the frequency or percent breakdown by different sources. Conversely, if the identifiers are important information you would need a quite different display.

    In Stata terms you would have more flexibility with a long data structure. Here I ignore the not applicables and focus on some methods for ordinal scales such as you have here. For more on that, see my presentation at https://www.stata.com/meeting/uk21/

    Please note the use of CODE delimiters and a data example similar to what you would get with dataex.

    I would probably tone down some of the colours in further work.


    Code:
    clear 
    input ID    A    B    C    D
    1    1    1    2    4
    2    1    2    1    4
    3    1    3    1    3
    4    1    0    0    0
    5    1    0    0    0
    6    1    1    1    1
    7    0    0    1    1
    8    1    2    1    2
    9    1    1    1    1
    10    1    1    2    .
    11    1    1    1    3
    12    1    1    1    2
    13    1    1    1    1
    14    0    0    2    2
    15    1    1    1    1
    16    1    1    1    4
    17    1    2    1    2
    18    1    1    1    1
    19    3    2    2    3
    20    1    1    1    4
    end 
    
    rename (A-D) (answer=)
    reshape long answer, i(ID) j(source) string 
    label def answer 0 "Not applicable" 1 "Not at all" 2  "A little" 3 "Somewhat" 4 "A lot"
    label val answer answer 
    
    set scheme s1color 
    
    preserve 
    
    drop if answer == 0 
    
    * download from Stata Journal 
    tabplot answer source, percent(source) name(G1, replace) showval separate(answer) ///
    bar1(color(red)) bar2(color(red*0.5)) bar3(color(blue*0.5)) bar4(color(blue)) yasis yla(1/4) ysc(r(1 .))
    
    * download from SSC 
    floatplot answer, over(source) highnegative(2) name(G2, replace) fcolors(red red*0.5 blue*0.5 blue) vertical subtitle(% by source)
    
    restore

    Click image for larger version

Name:	yablo_G1.png
Views:	1
Size:	20.6 KB
ID:	1672160
    Click image for larger version

Name:	yablo_G2.png
Views:	1
Size:	22.0 KB
ID:	1672161





    Comment


    • #3
      Thank you very much for your reply, Nick. My apologies regarding the clarity of my post. Regarding the variables, the names are simply 'ID', 'A', 'B', etc.

      With code below, I am able to have only one variable listed:
      graph bar (count), over(A, label(angle(90))) blabel(bar) title(Satisfaction of individuals with information obtained from Source A)

      By creating a long data structure, would this code be sufficient?

      I am hoping for something like the figure below:

      Thank you so much!

      Click image for larger version

Name:	Screen Shot 2022-07-05 at 10.53.03 AM.png
Views:	1
Size:	145.1 KB
ID:	1672218

      Comment


      • #4
        The graph in #3 is still unclear to me, as the sum of percents vastly exceeds 100 for each source.

        The code in #3 won't work with the data structure recommended in #2 as you no longer have a variable A. and in any case it only works with source A.

        You may be seeking something more like

        Code:
        rename (A-D) (answer=)
        reshape long answer, i(ID) j(source) string 
        label def answer 0 "Not applicable" 1 "Not at all" 2  "A little" 3 "Somewhat" 4 "A lot"
        label val answer answer 
        
        set scheme s1color 
        
        drop if answer == 0 
        
        graph bar (percent), over(source) over(answer) asyvars
        although that has to be a tentative answer as I have no idea how you are calculating percents.

        Comment


        • #5
          The graph in #3 was just a mock-up, not based on any real values! Sorry for the confusion.

          Your code worked perfectly. Thank you so much!

          Comment

          Working...
          X