Graphing two categorical variables

Danilo Silva

Join Date: Aug 2020

Posts: 14
#1

Graphing two categorical variables

08 Aug 2020, 22:01

Hello everyone,

I'm new to the Forum and relatively beginner at Stata, so sorry if the answer for what I am looking for is too obvious. I need to create a bar chart of two categorical variables: one is Subject (Math or History) and other is Type (A, B, C, or D). My data is structured as follows (example):

Subject Type

Math A
Hist A
Hist B
Math C
Hist C
Hist C
Hist D
Hist A
Hist A
Math A
Math A
Math A
Math C
Math C
Hist D
Math D
Math A

I'm looking for a command that will visually give me the percentage of each type, by subject, in one single graph. I'm looking for something like this:

If I do graph bar, over(Type) over(Subject), not only I end up with two graphs, one for each subject, but also Stata does not separately calculate the percentage for each Subject (in my database I have many more Math observations than History):

Using graph bar, over(Type) by(Subject) Stata gives me the relative proportions, but still in two graphs:

Any ideas of how to get what I need? Thanks!
Attached Files
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10077

09 Aug 2020, 02:49

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 subject str1 type
"Math" "A"
"Hist" "A"
"Hist" "B"
"Math" "C"
"Hist" "C"
"Hist" "C"
"Hist" "D"
"Hist" "A"
"Hist" "A"
"Math" "A"
"Math" "A"
"Math" "A"
"Math" "C"
"Math" "C"
"Hist" "D"
"Math" "D"
"Math" "A"
end

bys subject type: gen total=_N
bys subject: gen percent=(total/_N)*100
gr bar percent,  over(subject) over(type) asyvars ///
bargap(10) bar(1, color(red)) bar(2, color(blue)) ///
ytitle("Percent") scheme(s1color)

Click image for larger version

Name: Graph.png
Views: 1
Size: 23.3 KB
ID: 1567701

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35431
#3

09 Aug 2020, 03:24

Your graphs appear to be for your full dataset, so using your example data inevitably produces something a bit different.

Here are two more ways to approach this graph beyond the helpful answer from Andrew Musau. Andrew's post raises a strategic point, which is that sometimes you need to do some calculations ahead of the graph command to make it easier to get what you want. That's undoubtedly hard to know without detailed knowledge of the commands in question.

First I used catplot from SSC, which you must install first, but which here is a wrapper for graph bar, so there is no difference of principle.

Then I used tabplot which at the time of writing you should also download from SSC, although a longer write-up at https://www.stata-journal.com/articl...article=gr0066 remains germane. (Formal notification of the update is in press at Stata Journal 20(3).) tabplot is a wrapper for twoway rbar.

I wrote both of these wrappers but neither program knows or cares that I am fonder of tabplot and ,find it more useful, both for my own problems and for those that pass my way. Being able to lose the legend (kill the key) is, I believe, a feature as legends are at best necessary evils and at worst so complicated that almost no-one can be bothered to read them in detail. I also am a fan of the idea of hybrid graphs and tables, in which a reader can focus on the graphical elements and/or on the tabulated results, as a matter of taste or importance.

As in Andrew's post do please note the use of dataex for data examples as we do request (https://www.statalist.org/forums/help#stata).

Andrew's graph is the same as my first graph, except for some cosmetic choices. The only important difference is that catplot will calculate the percents that Andrew calculates first, and I dare say there may be a way of getting graph bar to do that directly too.

Stata makes it hard to add % to every number on an axis, and I approve.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str4 subject str1 type "Math" "A" "Hist" "A" "Hist" "B" "Math" "C" "Hist" "C" "Hist" "C" "Hist" "D" "Hist" "A" "Hist" "A" "Math" "A" "Math" "A" "Math" "A" "Math" "C" "Math" "C" "Hist" "D" "Math" "D" "Math" "A" end ssc inst catplot catplot subject type, percent(subject) recast(bar) asyvars bar(1, lcolor(blue) fcolor(blue*0.5)) bar(2, lcolor(red) fcolor(red*0.5)) yla(0(25)75, ang(h)) ytitle(%, orient(horiz)) name(danila1, replace) ssc inst tabplot tabplot type subject, percent(subject) separate(subject) bar1(lcolor(blue) fcolor(blue*0.5)) bar2(lcolor(red) fcolor(red*0.5)) showval subtitle(% within subject) aspect(1) name(danila2, replace)

Last edited by Nick Cox; 09 Aug 2020, 03:37.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35431
#4

09 Aug 2020, 03:34

Here are the graphs.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10077

09 Aug 2020, 08:22

Andrew's graph is the same as my first graph, except for some cosmetic choices. The only important difference is that catplot will calculate the percents that Andrew calculates first, and I dare say there may be a way of getting graph bar to do that directly too.

Indeed, I can think of one way using separate. However, directly with the data as is, I struggle!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 subject str1 type
"Math" "A"
"Hist" "A"
"Hist" "B"
"Math" "C"
"Hist" "C"
"Hist" "C"
"Hist" "D"
"Hist" "A"
"Hist" "A"
"Math" "A"
"Math" "A"
"Math" "A"
"Math" "C"
"Math" "C"
"Hist" "D"
"Math" "D"
"Math" "A"
end

encode type, gen(Type)
separate Type, by(subject)
gr bar (percent) Type?,  over(Type) bargap(10) bar(1, color(red)) ///
bar(2, color(blue)) ytitle("Percent") leg(order(1 "Hist" 2 "Math")) ///
scheme(s1color)

Last edited by Andrew Musau; 09 Aug 2020, 09:03.

Comment

Danilo Silva

Join Date: Aug 2020

Posts: 14
#6

09 Aug 2020, 09:10

Awesome. Thank you so much, Andrew and Nick. I struggled for hours trying to figure out how to do that. It is indeed not as intuitive as I thought it would be! And Nick, I'm definitely incorporating catplot in my future codings. Also thanks for the tips on how to post in this forum!
Comment
pavan pandey

Join Date: Apr 2019

Posts: 75
#7

12 Aug 2020, 12:59

Hi Everyone,

I am stuck at a problem but could not figure how to solve it. Below is an example of my dataset. Each row represents an observation (patients). The patients in the dataset may or may not have all the symptoms. Each presenting symptom is coded as a separate variable: facial_pain, hyposmia, anosmia headache etc. (0- Absent 1- Present).

I want to create a single bar graph with multiple bars wherein each bar represent one particular symptom and all these bars are displayed next to each other as multiple bars.

Thank you very much.

Regards
Pavan

* Example generated by -dataex-. To install: ssc install dataex
clear

input byte(face_pain hypos anosmia headache cough)

1 0 0 0 0
1 0 0 1 0
1 0 0 1 0
1 1 0 1 1
0 0 0 0 0
1 0 0 1 0
0 0 0 0 0
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10077
#8

12 Aug 2020, 13:38

If interested in #7, see

https://www.statalist.org/forums/for...tiple-variable
1 like
Comment
Siri Osnes

Join Date: Oct 2024

Posts: 2
#9

02 Oct 2024, 08:45

Hello,

I am new to StataForum and I have a question about graph bar with categorical variables. I want to make a graph with two categorical variables, and I want to include the total for var1. The first variable (spm5) is a question from a survey and the second variable is the region the person lives in (no_standardgeo). I have this code:

Code:

graph bar (percent), over(spm5) over(no_standardgeo, lab(angle(0) labsize(vsmall))) stack asyvars percentage blabel(bar, pos(center) format(%9.0f)) ytitle(Prosent)legend(size(vsmall)) bar(1, color(ebg)) bar(2, color(ebblue)) bar(3, color(edkblue))

And this is how my graph looks like:

And I want to include the total values for the variable spm5 in the graph as its own bar. Is this possible in STATA? I cannot find any options to include it.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10077

#10

02 Oct 2024, 09:08

Duplicating the data is one way of adding a total category. Consider the following:

Code:

sysuse auto, clear
graph bar (percent), over(foreign) over(rep78, lab(angle(0) labsize(vsmall))) ///
stack asyvars percentage blabel(bar, pos(center) format(%9.0f)) ///
ytitle(Prosent)legend(size(vsmall)) bar(1, color(ebg)) bar(2, color(ebblue)) ///
bar(3, color(edkblue)) saving(gr1, replace)

preserve
expand 2, gen(new)
replace rep78=99 if new
graph bar (percent), over(foreign) over(rep78, relabel(6 "Total") lab(angle(0) labsize(vsmall))) ///
stack asyvars percentage blabel(bar, pos(center) format(%9.0f)) ytitle(Prosent)legend(size(vsmall)) ///
bar(1, color(ebg)) bar(2, color(ebblue)) bar(3, color(edkblue)) saving(gr2, replace)
restore

gr combine gr1.gph gr2.gph, col(1)

Click image for larger version

Name: Graph.png
Views: 1
Size: 45.4 KB
ID: 1764917

Comment

Siri Osnes

Join Date: Oct 2024

Posts: 2
#11

02 Oct 2024, 09:52

Thank you so much, Andrew Musau! That worked perfectly!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35431
#12

03 Oct 2024, 00:23

See https://journals.sagepub.com/doi/pdf...867X1401400117 for more on Andrew Musau's technique.
1 like
Comment

Announcement