Creating bar chart with multiple binary variables

Valerie Scott

Join Date: Mar 2023

Posts: 12
#1

Creating bar chart with multiple binary variables

10 Mar 2023, 16:58

Hi, I am attempting to create a bar chart in Stata that shows proportions for multiple different binary variables. Each is a different measure of whether or not a medical provider offered a certain service at a health visit (yes/no). I've tried to recreate a reproducible example here that is similar to my dataset, along with the code I have tried so far to create this. Wondering if there's any way that I can show bar for both yes and no for each indicator? Also I am trying to relabel the variable names on the y axis but this code yields an invalid 'ylabel' error. New Stata user and first attempt at posting a reproducible example, please let me know if this isn't clear!

Code:

set obs 10 egen id = seq(), from(1) to(10) gen fp = cond(mod(_n, 2), 0, 1) gen cervical = cond(mod(_n, 4), 0, 1) gen immunization = cond(mod(_n, 6), 0, 1) lab define fp 0 "No" 1 "Yes" lab define cervical 0 "No" 1 "Yes" lab define immunization 0 "No" 1 "Yes" lab values fp fp lab values cervical cervical lab values immunization immunization statplot fp cervical immunization, varopts(sort(1)) blabel(bar), ylabel(fp "Provider discussed family planning with mother" cervical "Provider offered cervical cancer screening to mother" immunization "Provider immunized baby")
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

10 Mar 2023, 17:18

statplot is from SSC (FAQ Advice #12). Being a wrapper for graph hbar, you need to look at that command's options.

Code:

help graph hbar

In this case, you need the -relabel()- option within -varopts()-

Code:

relabel(1 "Provider discussed family planning with mother" 2"Provider offered cervical cancer screening to mother" 3 "Provider immunized baby")

Be sure that the numbering matches the relevant bar.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

11 Mar 2023, 02:15

Another possibility here is designplot from the Stata Journal:

Code:

. search designplot, sj

Search of official help files, FAQs, Examples, and Stata Journals

SJ-19-3 gr0061_3  . . . . . . . . . . . . . . . Software update for designplot
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q3/19   SJ 19(3):748--751
        any attempt to use the missing option of graph dot,
        graph hbar, or graph bar is now ignored and advice on
        what to do instead is shown

SJ-17-3 gr0061_2  . . . . . . . . . . . . . . . Software update for designplot
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q3/17   SJ 17(3):779
        help file updated

SJ-15-2 gr0061_1  . . . . . . . . . . . . . . . Software update for designplot
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q2/15   SJ 15(2):605--606
        bug fixed for Stata 14

SJ-14-4 gr0061  Design plots for graphical summary of a response given factors
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q4/14   SJ 14(4):975--990
        produces a graphical summary of a numeric response variable
        given one or more factors

designplot wants an outcome variable, but we just need to feed it a variable that is constant as if that were the outcome and ignore that constant otherwise. Then we get a display of the marginal distributions of each "predictor".

Code:

clear
set obs 10
egen id = seq(), from(1) to(10)
gen fp = cond(mod(_n, 2), 0, 1)
gen cervical = cond(mod(_n, 4), 0, 1)
gen immunization = cond(mod(_n, 6), 0, 1)
lab define fp 0 "No" 1 "Yes"
lab define cervical 0 "No" 1 "Yes"
lab define immunization 0 "No" 1 "Yes"
lab values fp fp
lab values cervical cervical
lab values immunization immunization

set scheme s1color

gen one = 1

label var one "interesting data"

designplot one fp cervical immunization, stat(count) min(1) max(1) recast(hbar) variablenames name(G1, replace)

Click image for larger version

Name: designplot_G1.png
Views: 1
Size: 16.0 KB
ID: 1705328

In your real problem you likely have not 3 but more like 10 or 30 variables. As they're all binary, the means carry all the information (other than sample sizes). A little work to collapse to a dataset of means and counts allows a great deal by way of flexible plotting. For this concocted example, the sample size (# of non-missing values) is the same for all variables, but for real data that clearly isn't guaranteed.

Showing both Yes and No frequencies for many binary variables is in practice as likely to confuse as to clarify.

The code here isn't going to be much different for a dataset with many more variables.

The option choices naturally aren't the only choices possible, or even the best, but mostly given here to underline some possibilities. For a graph with table flavour, I like the table convention of putting explanatory stuff at the top, not the bottom. More on that at https://www.stata-journal.com/articl...article=gr0053

The more variables you have, the more likely it is that a vertical display will just be a mess.

Code:

foreach v of varlist fp-immunization {
    local call `call' (count) count`v'=`v' (mean) mean`v'=`v'
}

collapse `call'

gen id = 1

reshape long count mean, i(id) j(which) string

gen toshow = which + " ({it:n}=" + strofreal(count) + ")"

sort mean

graph dot (asis) mean, over(toshow, sort(1) descending) linetype(line) lines(lc(gs12) lw(vthin)) name(G2, replace) ysc(alt) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50") ytitle(% practising)

graph hbar (asis) mean, over(toshow, sort(1) descending) bar(1, fcolor(blue*0.2) lcolor(blue))  name(G3, replace) ysc(alt) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50") ytitle(% practising)

Click image for larger version

Name: designplot_G2.png
Views: 1
Size: 13.1 KB
ID: 1705329

Click image for larger version

Name: designplot_G3.png
Views: 1
Size: 14.4 KB
ID: 1705330

Last edited by Nick Cox; 11 Mar 2023, 02:59.

Comment

Valerie Scott

Join Date: Mar 2023

Posts: 12
#4

21 Mar 2023, 18:32

Thank you so much Nick Cox and Andrew Musau, this is incredibly helpful! I really appreciate your comments and am thrilled to have much better bar graphs now Somewhat related, I am also struggling with how to create a pie graph of "Pregnancy danger signs experienced" from multiple binary indicators, each of which is a yes/no for whether or not a danger sign (eg blurred vision, severe headache) was experienced. My dataset has many different questions that are coded as multiple binary indicators so I'm still learning how to work with this.

Below is a slightly adjusted example dataset that is similar to what I have (although my data has some missing values in the variables and I couldn't figure out how to add those missing values into the example dataset) My thinking was to create a new categorical variable out of the multiple binary indicators using code like the below and then graph that new categorical variable, but I'm realizing that doesn't work because of course it's creating a variable with a value for each mother in the dataset. Mothers can report multiple danger signs, so what I'm trying to make is a pie chart showing of all danger signs reported (a different n than the n of individuals), this is the % of the danger signs reported that were blurred vision, % that were severe headache etc.

Code:

clear set obs 10 egen id = seq(), from(1) to(10) gen bleeding = cond(mod(_n, 2), 0, 1) gen breathingdiff = cond(mod(_n, 4), 0, 1) gen chestpain = cond(mod(_n, 6), 0, 1) gen fever = cond(mod(_n, 2), 0, 1) gen abdominalpain = cond(mod(_n, 4,), 0, 1) lab define bleeding 0 "No" 1 "Yes" lab define breathingdiff 0 "No" 1 "Yes" lab define chestpain 0 "No" 1 "Yes" lab define fever 0 "No" 1 "Yes" lab define abdominalpain 0 "No" 1 "Yes" lab values bleeding bleeding lab values breathingdiff breathingdiff lab values chestpain chestpain lab values fever fever lab values abdominalpain abdominalpain gen mom_dangersign_cat = . replace mom_dangersign_cat = 1 if bleeding == 1 replace mom_dangersign_cat = 2 if breathingdiff == 1 replace mom_dangersign_cat = 3 if chestpain == 1 replace mom_dangersign_cat = 4 if fever == 1 replace mom_dangersign_cat = 5 if abdominal pain== 1 lab define mom_dangersign_cat 1 "Vaginal bleeding (heavy or sudden increase)" 2 "Breathing difficulty" 3 "Chest pain" 4 "Fever" 5 "Severe abdominal pain" lab values mom_dangersign_cat mom_dangersign_cat

This seems like it shouldn't be hard to do and Nick's explanation above is making me think I need to create a new dataset for the graphing, but I've gotten completely stuck about how to do that and get the right denominator using these binary indicators. I'd be grateful for any ideas on how to go about this!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

22 Mar 2023, 01:37

You can use tabm from tab_chi on SSC to get the reduced dataset you need, at least temporarily. No pie chart is going to be easier to read or write about than a bar chart, or so I suggest. As you mention, you need to explain it's number of symptoms, not number of women being shown, and a pie chart's implications of subdividing a total here are unfortunate, apart from its other problems.

Code:

clear
set obs 10
egen id = seq(), from(1) to(10)
gen bleeding = cond(mod(_n, 2), 0, 1)
gen breathingdiff = cond(mod(_n, 4), 0, 1)
gen chestpain = cond(mod(_n, 6), 0, 1)
gen fever = cond(mod(_n, 2), 0, 1)
gen abdominalpain = cond(mod(_n, 4), 0, 1)
lab define bleeding 0 "No" 1 "Yes"
lab define breathingdiff 0 "No" 1 "Yes"
lab define chestpain 0 "No" 1 "Yes"
lab define fever 0 "No" 1 "Yes"
lab define abdominalpain 0 "No" 1 "Yes"
lab values bleeding bleeding
lab values breathingdiff breathingdiff
lab values chestpain chestpain
lab values fever fever
lab values abdominalpain abdominalpain

preserve
* ssc install tab_chi 
tabm bl-ab, replace
label li _stack 
label def _stack 2 "breathing difficulties" 3 "chest pain" 5 "abdominal pain", modify 
set scheme s1color 
graph hbar (count) if _values, over(_stack, sort(1) descending) blabel(total)
save wantedforgraph 
restore

Click image for larger version

Name: difficulties.png
Views: 1
Size: 16.1 KB
ID: 1706538

Incidentally, although I'm pleased to see egen, seq() in action -- I wrote the first version --

Code:

gen id = _n

would serve your purpose fine.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#6

22 Mar 2023, 03:23

The same information is shown by

Code:

graph hbar (sum) bl-ab, blabel(total)

but I prefer the version above.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

22 Mar 2023, 07:57

Wanting the single variable counts is one thing, but as we all know symptoms can occur together.

Consider upsetplot and vennbar from SSC.

https://www.statalist.org/forums/for...lable-from-ssc

https://www.statalist.org/forums/for...lable-from-ssc

Code:

clear
set obs 10
egen id = seq(), from(1) to(10)
gen bleeding = cond(mod(_n, 2), 0, 1)
gen breathingdiff = cond(mod(_n, 4), 0, 1)
gen chestpain = cond(mod(_n, 6), 0, 1)
gen fever = cond(mod(_n, 2), 0, 1)
gen abdominalpain = cond(mod(_n, 4), 0, 1)

label var ab "abdominal pain"
label var br "breathing difficulty"
label var ch "chest pain"

upsetplot bl-ab, labelopts(mlabel(_count) mlabpos(12)) ysc(r(. 6)) name(UP, replace)

vennbar bl-ab , varlabels blabel(bar) name(VB, replace)

The graphs for the full dataset would be more complicated (32 subsets possible) but presumably more interesting.

Click image for larger version

Name: UP.png
Views: 1
Size: 13.3 KB
ID: 1706589

Click image for larger version

Name: VB.png
Views: 1
Size: 16.8 KB
ID: 1706590

Comment

Valerie Scott

Join Date: Mar 2023
Posts: 12

01 Apr 2023, 15:09

Thank you so much Nick Cox for your time and very helpful responses! I ended up using tabm and completely agree this is more clear than a pie graph. In case it's helpful to anyone with a similar need in the future, I made one small tweak below to exclude missing values from the horizontal bar graph as I did not want those included in the counts for the variable being graphed.

Code:

clear
set obs 10
egen id = seq(), from(1) to(10)
gen bleeding = cond(mod(_n, 2), 0, 1)
gen breathingdiff = cond(mod(_n, 4), 0, 1)
gen chestpain = cond(mod(_n, 6), 0, 1)
gen fever = cond(mod(_n, 2), 0, 1)
gen abdominalpain = cond(mod(_n, 4), 0, 1)

lab define bleeding 0 "No" 1 "Yes"
lab define breathingdiff 0 "No" 1 "Yes"
lab define chestpain 0 "No" 1 "Yes"
lab define fever 0 "No" 1 "Yes"
lab define abdominalpain 0 "No" 1 "Yes"
lab values bleeding bleeding
lab values breathingdiff breathingdiff
lab values chestpain chestpain
lab values fever fever
lab values abdominalpain abdominalpain  

preserve
*ssc install tab_chi
tabm bleeding-abdominalpain, replace
label li _stack
label def _stack 1 "Bleeding" 2 "Breathing difficulty" 3 "Chest pain" 4 "Fever" 5 "Severe abdominal pain", modify
set scheme s1color
graph hbar (count) if _values & !missing(_values), over(_stack, sort(1) descending) blabel(total)
save wantedforgraph
restore

Last edited by Valerie Scott; 01 Apr 2023, 15:13.

Announcement