Bar Graph for categorical variables

Lisa Alejandra Kobrich

Join Date: Dec 2023

Posts: 17
#1

Bar Graph for categorical variables

23 Dec 2023, 03:39

Hello everyone!

I was wodering if any of you could perhaps help me to make a bar graph for five categorical variables. Each of them has 600 observations and the possible values are the following: "correct", "incorrect", "refused to answer".

I need to have vertical bars, one for each variable, that show the percentage of each category.

Any suggestions?

I have tried the following command:
. graph bar (percent), over(f_skill_el_?) stack
But it seems like I am including too many variables

I have also tried using commands such as catplot but I think the maximum number of variables is 3.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10089

23 Dec 2023, 05:01

You will need to reshape your data and calculate the percentages if using graph. For your future posts, please familiarize yourself with the dataex command for presenting data examples (see FAQ Advice #12 for details).

Code:

clear
*GENERATE DATASET
set obs 600
set seed 12232023
forval i=1/6{
    gen f_skill_el_`i'= runiformint(0,1)< 1-0.`i'
    replace f_skill_el_`i'=2 in 1/`i'0
}
lab define f_skill 2 "Don't know" 0 "Correct" 1 "Incorrect"
lab values f_skill* f_skill  

*START HERE
gen obs_no=_n
reshape long f_skill_el_, i(obs_no) j(which)
bys which: gen percent=_N
bys f_skill_el_ which: replace percent= (_N/percent)*100
graph bar percent, over(f_skill_el_) over(which ) asyvars stack ytitle("Percent")

Click image for larger version

Name: Graph.png
Views: 1
Size: 33.9 KB
ID: 1738058

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35438

23 Dec 2023, 05:04

catplot is from SSC (FAQ Advice #12).

Here is some technique:

Code:

* sandbox, given no data example
clear
set obs 600

set seed 2312

forval j = 1/5 {
    gen f_skill_el_`j' = cond(_n < 600 / (3 * `j'), 3, cond(_n < 600 / (2 * `j'), 2, 1))
}

* you start here -- but use sensible value labels

preserve

stack f_skill_el_?, into(toshow) clear
label def answer 1 "correct" 2 "incorrect" 3 "refused"
label val toshow answer

label def _stack 1 frog 2 toad 3 newt 4 dragon 5 gecko
label val _stack _stack

tab toshow _stack

catplot toshow , over(_stack) asyvars l1title("") ysc(alt)

restore

Comment

Lisa Alejandra Kobrich

Join Date: Dec 2023

Posts: 17
#4

03 Jan 2024, 10:10

Hi Andrew and Nick!

Thank you very much for your help! I have followed your recommendations and I still cannot manage to get what I need.

I cannot use dataex since the data I am working with is confidential. So, I will provide more details regarding the data I have

Context: we interviewed 600 people. We divided them into 4 groups regarding their marital status: single, married, divorced, and widowed. Moreover, each person had to respond to six questions about their skills.
This is where the variables come from. Each group has six variables, one for each question.

Group 1: single
Question 1: f_skill_el_1
Question 2: f_skill_el_2
...

Group 2: married
Question 1: f_skill_ma_1
Question 2: f_skill_ma_2
...

Group 3: divorced
Question 1: f_skill_di_1
Question 2: f_skill_di_2
...

Group 4: widowed
Question 1: f_skill_wi_1
Question 2: f_skill_wi_2
...

Each variable f_skill_??_? can have three possible values: 1 "correct", 2 "incorrect", 3 "refused to answer".

I have already created the variables and labeled them correctly.

Now, my task is to have one graph for each group. In each graph, I must have just six vertical bars, one for each question, that show the percentage of people who responded correctly, incorrectly or refused to respond (as you can see in the graph picture of Andrew).

When I try to reshape as Andrew suggested, I get an error because no xij variables found.

Thank you again for your help and please let me know if I should provide more information. This is my first time on this Forum. So I am still getting used to it.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35438
#5

03 Jan 2024, 11:36

https://www.statalist.org/forums/help#stata 12.2 says

If your dataset is confidential, then provide a fake example instead.

A wild guess is that variables for 4 groups can be combined by something like

Code:

forval j = 1/6 { egen f_skill_`j' = rowmax(f_skill_??_`j') }

as presumably the single people don't answer questions aimed at married, divorced, widowed; and more generally in each observation there are missing values for three out of each four variables.

So, continuing this wild guess, you can reduce your data to the group identifier and the six variables created by the loop above.

We just need to see the frequencies as shown by those seven variables and e.g. the groups command from the Stata Journal.

If this guess is wrong, or you cannot follow what I am saying, and no one else gives a better answer, then as said I think we need a realistic data example.
1 like
Comment
Lisa Alejandra Kobrich

Join Date: Dec 2023

Posts: 17
#6

03 Jan 2024, 14:32

Hi Nick Cox ,

Thank you for your quick answer.

I think I still need to be more specific with what I am looking for.

I made a very simple dataset with the following commands:

set obs 10

gen Group1_1 = floor(3 * runiform())
replace Group1_1 = 99 if Group1_1 == 2

gen Group1_2 = floor(3 * runiform())
replace Group1_2 = 99 if Group1_2 == 2

label define response_lbl 0 "Incorrect" ///
1 "Correct" ///
99 "Refuse"

label values Group1_? response_lbl

Now, I would like to get a graph like the one attached.

I hope this is more clear now. If not, please let me know.

Last edited by Lisa Alejandra Kobrich; 03 Jan 2024, 15:03.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10089

03 Jan 2024, 15:23

The implementation closely follows #2, except that you have multiple groups.

Code:

clear
set obs 100
set seed 01032024
forval i=1/4{
    gen Group`i'_1 = floor(3 * runiform())
    replace Group`i'_1 = 99 if Group`i'_1 == 2
    gen Group`i'_2 = floor(3 * runiform())
    replace Group`i'_2 = 99 if Group`i'_2 == 2
}
label define response_lbl 0 "Incorrect" ///
1 "Correct" ///
99 "Refused to answer"
label values Group* response_lbl
gen long obs_no=_n
rename Group* response*
reshape long response, i(obs_no) j(which) string
gen Q= "Q" + substr(which, -1, 1)
gen group= "Group " + substr(which, 1, 1)
drop which
bys group Q: gen percent=_N
bys response group Q: replace percent= (_N/percent)*100
graph bar percent, over(response) over(Q) asyvars stack ytitle("Percent") by(group, note(""))

Click image for larger version

Name: Graph.png
Views: 1
Size: 38.9 KB
ID: 1738803

Last edited by Andrew Musau; 03 Jan 2024, 15:29.

Comment

Lisa Alejandra Kobrich

Join Date: Dec 2023

Posts: 17
#8

03 Jan 2024, 16:22

Hi Andrew Musau

Thank you so much! This is really helpful!!

I just have one more question, if you do not mind. There are some missing values in my dataset because not everyone is required to respond to all the questions. How can I get the percentage out of the total of respondents required to respond to that specific question and not the total respondents in general?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35438

03 Jan 2024, 18:56

This takes Andrew Musau's helpful example one step further and simulates some missing values. A good way to ignore them is to drop them before the graphics.

I add a display using tabplot from the Stata Journal. This dispenses with a statement of the obvious -- that percent stacked bars add to 100% -- in favour of a layout that allows more direct comparison within the display, including display of percents themselves.

For a quick overview of tabplot skip and skim through https://www.statalist.org/forums/for...updated-on-ssc

Code:

clear
set obs 100
set seed 01032024
forval i=1/4{
    gen Group`i'_1 = floor(3 * runiform())
    replace Group`i'_1 = 99 if Group`i'_1 == 2
    gen Group`i'_2 = floor(3 * runiform())
    replace Group`i'_2 = 99 if Group`i'_2 == 2
}
label define response_lbl 0 "Incorrect" ///
1 "Correct" ///
99 "Refused to answer"
label values Group* response_lbl
gen long obs_no=_n
rename Group* response*
reshape long response, i(obs_no) j(which) string
replace response = . if runiform() < 0.07 
gen Q= "Q" + substr(which, -1, 1)
gen group= "Group " + substr(which, 1, 1)
drop which
drop if missing(response)


bys group Q: gen percent=_N
bys response group Q: replace percent= (_N/percent)*100
graph bar percent, over(response) over(Q) asyvars stack ytitle("Percent") by(group, note("")) name(Andrew, replace)


tabplot response Q, percent(Q group) separate(response) by(group, row(1) note("")) name(Nick, replace) ytitle("") xtitle("") showval(format(%2.1f))

Click image for larger version

Name: Andrew.png
Views: 1
Size: 67.3 KB
ID: 1738810

Click image for larger version

Name: Nick.png
Views: 1
Size: 68.9 KB
ID: 1738811

Comment

Lisa Alejandra Kobrich

Join Date: Dec 2023

Posts: 17
#10

09 Jan 2024, 02:04

Nick Cox thank you very much! It works just fine now.

Do you know if it would be possible to add a general title on the top of all the graphs?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35438

#11

09 Jan 2024, 03:51

Code:

graph bar percent, over(response) over(Q) asyvars stack ytitle("Percent") by(group, title(Whatever) note("")) name(Andrew, replace)


tabplot response Q, percent(Q group) separate(response) by(group, title(Whatever) row(1) note("")) name(Nick, replace) ytitle("") xtitle("") showval(format(%2.1f))

Comment

Lisa Alejandra Kobrich

Join Date: Dec 2023

Posts: 17
#12

09 Jan 2024, 04:30

Oh great! Thanks!!
Comment

Announcement