Clustered bar graph with multiple categorical variables

Nicole Yablonsky

Join Date: Jul 2022
Posts: 5

Clustered bar graph with multiple categorical variables

04 Jul 2022, 21:52

Hello,

I am hoping to create a clustered bar graph to compare satisfaction with information received from various sources, such that the categories of satisfaction (i.e., not at all, a little, somewhat, very much) are on the x-axis with a separate bar for each source of information (i.e., A, B, C, D). Despite trying multiple different commands, namely 'bar graph, over()' and 'as category', I have not been able to achieve this. Guidance would be very much appreciated!!

In the data below, 0 = Not applicable, 1 = Not at all, 2 = A little, 3 = Somewhat, 4 = A lot

ID	A	B	C	D
1	1	1	2	4
2	1	2	1	4
3	1	3	1	3
4	1	0	0	0
5	1	0	0	0
6	1	1	1	1
7	0	0	1	1
8	1	2	1	2
9	1	1	1	1
10	1	1	2	.
11	1	1	1	3
12	1	1	1	2
13	1	1	1	1
14	0	0	2	2
15	1	1	1	1
16	1	1	1	4
17	1	2	1	2
18	1	1	1	1
19	3	2	2	3
20	1	1	1	4

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35429

05 Jul 2022, 00:53

I don't get a clear sense from this what you want to be clusters, or what you want to be bars, and your syntax does not help much, as bar graph isn't even legal and you don't mention any variable names.

Either way, I see here a two-way table, so you need a way of showing the frequency or percent breakdown by different sources. Conversely, if the identifiers are important information you would need a quite different display.

In Stata terms you would have more flexibility with a long data structure. Here I ignore the not applicables and focus on some methods for ordinal scales such as you have here. For more on that, see my presentation at https://www.stata.com/meeting/uk21/

Please note the use of CODE delimiters and a data example similar to what you would get with dataex.

I would probably tone down some of the colours in further work.

Code:

clear 
input ID    A    B    C    D
1    1    1    2    4
2    1    2    1    4
3    1    3    1    3
4    1    0    0    0
5    1    0    0    0
6    1    1    1    1
7    0    0    1    1
8    1    2    1    2
9    1    1    1    1
10    1    1    2    .
11    1    1    1    3
12    1    1    1    2
13    1    1    1    1
14    0    0    2    2
15    1    1    1    1
16    1    1    1    4
17    1    2    1    2
18    1    1    1    1
19    3    2    2    3
20    1    1    1    4
end 

rename (A-D) (answer=)
reshape long answer, i(ID) j(source) string 
label def answer 0 "Not applicable" 1 "Not at all" 2  "A little" 3 "Somewhat" 4 "A lot"
label val answer answer 

set scheme s1color 

preserve 

drop if answer == 0 

* download from Stata Journal 
tabplot answer source, percent(source) name(G1, replace) showval separate(answer) ///
bar1(color(red)) bar2(color(red*0.5)) bar3(color(blue*0.5)) bar4(color(blue)) yasis yla(1/4) ysc(r(1 .))

* download from SSC 
floatplot answer, over(source) highnegative(2) name(G2, replace) fcolors(red red*0.5 blue*0.5 blue) vertical subtitle(% by source)

restore

Click image for larger version

Name: yablo_G1.png
Views: 1
Size: 20.6 KB
ID: 1672160

Click image for larger version

Name: yablo_G2.png
Views: 1
Size: 22.0 KB
ID: 1672161

Comment

Nicole Yablonsky

Join Date: Jul 2022

Posts: 5
#3

05 Jul 2022, 08:57

Thank you very much for your reply, Nick. My apologies regarding the clarity of my post. Regarding the variables, the names are simply 'ID', 'A', 'B', etc.

With code below, I am able to have only one variable listed:
graph bar (count), over(A, label(angle(90))) blabel(bar) title(Satisfaction of individuals with information obtained from Source A)

By creating a long data structure, would this code be sufficient?

I am hoping for something like the figure below:

Thank you so much!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35429
#4

05 Jul 2022, 10:48

The graph in #3 is still unclear to me, as the sum of percents vastly exceeds 100 for each source.

The code in #3 won't work with the data structure recommended in #2 as you no longer have a variable A. and in any case it only works with source A.

You may be seeking something more like

Code:

rename (A-D) (answer=) reshape long answer, i(ID) j(source) string label def answer 0 "Not applicable" 1 "Not at all" 2 "A little" 3 "Somewhat" 4 "A lot" label val answer answer set scheme s1color drop if answer == 0 graph bar (percent), over(source) over(answer) asyvars

although that has to be a tentative answer as I have no idea how you are calculating percents.
Comment
Nicole Yablonsky

Join Date: Jul 2022

Posts: 5
#5

05 Jul 2022, 13:35

The graph in #3 was just a mock-up, not based on any real values! Sorry for the confusion.

Your code worked perfectly. Thank you so much!
Comment

Announcement

Clustered bar graph with multiple categorical variables

Comment

Comment

Comment

Comment