Problems Introducing CI in Bar Chart

Patricia Schafer

Join Date: Aug 2019

Posts: 4
#1

Problems Introducing CI in Bar Chart

26 Aug 2019, 03:00

Hey there, I am having a problem with –catplot- in making descriptive statistics with two categorical variables (using Stata 15 for windows).
My data structure: One binary and one categorical variable, where var1 (0,1) and var2 (1, 2, 3, 4).
I want to make something like the following bar chart, which has been made with -catplot-

command: catplot var1, over(var2) percent(var1) asyvar recast(bar)

I would like to introduce now the 90% confidence intervals over the different bars (percentages of categories of var2 within the two groups of var1). I was able to find out the 90% confidence intervals for each category of var2 within the categories of var1 with –proportion-
command: proportion var2, over(var1) level(90)

Over Proportion Std. Error 90% CI lower 90% CI upper

_prop_1

var1.1 0.46666017 0.0110185 0.4485274 0.484764

var1.2 0.5220399 0.0070236 0.5104759 0.53335803

... ... ... ... ...

On the table above you can see the upper and lower 90%-CI for the first two bars of the bar chart above.
Can someone help me to introduce these 90% confidence intervals in this graphic? Or is it not possible to this with the catplot command – is there another possibility to do so?

Your help is very much appreciated - thanks in advance!
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

26 Aug 2019, 05:21

I gather this text will be helpful to you.

Best regards,

Marcos
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

26 Aug 2019, 05:22

catplot is from SSC, as you are asked to explain (FAQ Advice #12). As used here it's a wrapper for graph bar, but there isn't any way to combine that with twoway rcap or twoway rspike, which is what you need for adding confidence intervals.

For what you want, you probably need twoway bar and twoway rcap combined.

https://journals.sagepub.com/doi/pdf...867X1001000112 explains one way to approach confidence interval display.

I would strongly recommend using evocative variable names, not the utterly colourless var1 and so forth.
Comment
Patricia Schafer

Join Date: Aug 2019

Posts: 4
#4

26 Aug 2019, 05:46

Dear Marcos and Dear Nick
thank you both very much for your reply. This is indeed extremly helpful!

Best regards
Patricia
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#5

26 Aug 2019, 06:19

I don't quite share Marcos Almeida 's enthusiasm for https://stats.idre.ucla.edu/stata/fa...th-error-bars/

It's perhaps obvious to discerning readers, but as shown there the confidence intervals do not show up well. Better to have stronger colours for the intervals and lighter colours for the bars (e,g, none at all). In fact, the much deprecated "dynamite plot" format (Google for discussions) can be avoided altogether, as can the distraction of arbitrary colours and the indirection of a legend.

Also, as #3 already stated, you don't need such a "do-it-yourself" approach to confidence intervals.

Here's an alternative.

Code:

use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear statsby, by(race ses) : ci mean write , level(90) twoway scatter race mean, yla(1/4, valuelabel tlength(0) ang(h))scheme(s1color) /// || rcap ub lb race, horizontal by(ses, subtitle(Mean writing scores by SES and race with 90% confidence intervals)note("") col(1) legend(off)) subtitle(, pos(9) nobox nobexpand) ytitle("")
Comment
Patricia Schafer

Join Date: Aug 2019

Posts: 4
#6

26 Aug 2019, 06:21

Hey there, I have one more uncertainty concerning the plotting of the percentages and CI's.

On my Y-axis I do not want to have the means of each category, but I am interested in the percentage of one category over the binary variable (see graph in the first post).
Example: Within the binary variable of men (0) an women (1), how many percentages are in the first, second, third and fourth category? (so that men sum up to 100% and women sum up to 100%)

Is there any solution to do this with twoway bar and twoway rcap? Unfortunately everything that I checked referred to the mean and not to percentages within a group.

Thanks again in advance for your help!

Best regards,
Patricia
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#7

26 Aug 2019, 06:47

Indeed. But you're still plotting means, as each percent is the mean of an indicator variable (scaled by 100, which is cosmetic here). And it's key not to mix two different problems. twoway bar etc is just for plotting and you still need to calculate the means and confidence intervals first.

There may well be a simpler or better formulation, but this indicates some technique. (Using Jeffreys's method is partly a personal choice.)

Code:

sysuse auto, clear tab rep78, gen(rep78_) ci prop rep78_* gen mean = . gen ub = . gen lb = . gen work = . quietly forval j = 1/5 { replace work = rep78 == `j' if rep78 < . ci prop work , jeffreys level(90) replace mean = r(mean) if rep78 == `j' replace ub = r(ub) if rep78 == `j' replace lb = r(lb) if rep78 == `j' } collapse mean ub lb , by(rep78) set scheme s1color scatter mean rep78 || rcap ub lb rep78 , legend(off) ytitle(Percents and 90% confidence intervals) yla(0 0.1 "10" 0.2 "20" 0.3 "30" 0.4 "40" 0.5 "50", ang(h))

Your problem is a little different but the mean of an indicator for being female gives you the mean of being male by subtraction (given your categories).

As you're not giving a data example (FAQ Advice #12) we can't use it.

Last edited by Nick Cox; 26 Aug 2019, 06:51.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

26 Aug 2019, 07:32

This may be closer to your problem.

Code:

ysuse auto, clear 

statsby, by(rep78): ci prop foreign, level(90) 

list 

set scheme s1color 

scatter mean rep78 || rcap ub lb rep78 , legend(off) ytitle(Percent foreign and 90% confidence intervals) yla(0 1 "100" 0.5 "50" 0.25 "25" 0.75 "75")

Comment

Danae Arroyos

Join Date: Feb 2023

Posts: 8
#9

22 Sep 2023, 04:15

Hello,

I just came across this and found it helpful to see that what I want to do can be done, but I need a bit more hand holding so would appreciate the help.

I have two binary independent variables ("role" = first, second; "informed" = yes, no) and a continuous dependent variable "dep".

I have used "graph bar" to plot the average of "dep" over(informed) over(role) to get something that looks like the first two column clusters in the graph from the first post.

I need to add confidence intervals, and I understand I can do twoway (graph bar) (rcap) but I do not know how to compute the rcap over two variables (informed and role) rather than one (as in the examples).

Could you please help me? Thanks a lot in advance!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#10

22 Sep 2023, 06:28

#9 isn't really a new question as all the key points are already explained in the thread. Your use of a different dataset without a reference or data example doesn't allow illustrations with your data.

graph bar is useless for your purposes as you can't add confidence intervals, as already said.

#5 is already a worked example with one outcome and two predictors, exactly equivalent to what you want. If you strongly prefer bars to scatters, which is hard to understand, you need to call up twoway bar.

This code shows bar + error bar plots. It is #5 reworked to that effect.

Code:

use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear statsby, by(race ses) : ci mean write , level(90) twoway bar mean race, horizontal base(0) barw(0.8) yla(1/4, valuelabel tlength(0) ang(h)) /// || rcap ub lb race, horizontal by(ses, subtitle(Mean writing scores by SES and race with 90% confidence intervals)note("") col(1) legend(off)) subtitle(, pos(9) nobox nobexpand) ytitle("")

It is a lousy plot compared with #5 as a lot of space and ink is expended on showing that the scores are not zero. Sometimes that is a point that needs to be made. dep is perfectly anonymous in #9, so speculation is futile on whether you need to do that. but I'd say that in my experience more than 90% of confidence interval graphs are better without bars starting at zero. The point is that usually the comparison of interest is of scores with other scores, not with zero.

If you need more hand-holding than this, please give a data example. https://www.statalist.org/forums/help#stata explains how to do that.
Comment
Danae Arroyos

Join Date: Feb 2023

Posts: 8
#11

27 Sep 2023, 03:55

Hello,

Thank you very much for your help.

I have made progress (and I take the point you made about including more information such as my dataset and information regarding the nature of the variables).

I changed my variable names to PREF(No,Yes) and INFO(infored dict, informed recip). My dependent variable is a binary variable stating whether the participant was generous or not. This graph is standard for these types of analyses.

The graph on the left is what I need, but with confidence intervals.
The graph on the right is where I've got so far.

The INFO labels (informed dict, recip) used to be on the left hand side of the graphs, so when I move them manually, the two 'blocks' are separated.

For the left hand side figure, I used:

Code:

graph bar generous, over(pref) over(info) asyvars showyvars legend(off)

For the right hand side figure, I used the following (but did some edits by hand for the x-ticks and to move the "role" labels):

Code:

twoway bar mean pref, yla(0(0.2)0.6, valuelabel tlength(0) ang(h)) /// || rcap ub lb pref, by(info, subtitle(TITLE)note("") col(2) legend(off)) subtitle(, pos(9) nobox nobexpand) ytitle("")

I was wondering whether I could have the INFO labels below the graphs by default, so that the space disappears. Or whether there's another way of joining the two panels on the right hand figure so that it looks closer to the one on the left?

Thank you
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#12

27 Sep 2023, 08:03

I take the point you made about including more information such as my dataset

Good, so please give your dataset, or an example.
Comment

Danae Arroyos

Join Date: Feb 2023
Posts: 8

#13

28 Sep 2023, 03:04

This is my partial dataset:

Code:

info    pref    generous
Informed Dictator    Yes    0
Informed Recipient    No    0
Informed Dictator    No    0
Informed Recipient    No    0
Informed Dictator        1
Informed Recipient        1
Informed Dictator    Yes    0
Informed Recipient    No    0
Informed Dictator    Yes    0
Informed Recipient    No    0
Informed Dictator        0
Informed Recipient        0
Informed Dictator    No    1
Informed Recipient    No    1
Informed Dictator    Yes    0
Informed Recipient    No    0
Informed Dictator        0
Informed Recipient        0
Informed Dictator    Yes    0
Informed Recipient    Yes    0
Informed Dictator    Yes    1
Informed Recipient    Yes    1
Informed Dictator        0
Informed Recipient        0
Informed Dictator    Yes    0
Informed Recipient    No    0
Informed Dictator    No    0
Informed Recipient    No    0
Informed Dictator    No    0
Informed Recipient    No    0
Informed Dictator    Yes    0
Informed Recipient    No    0

And dataex says
input byte(info pref) float generous

I hope this helps!

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35432
#14

28 Sep 2023, 08:17

Sorry, but I can't yet follow what is going on. dataex wouldn't produce output like that. Strings would be delimited by " ". It seems that you have four values in some observations (rows), and three in the others, but only three variables (columns) are named.
Comment

Danae Arroyos

Join Date: Feb 2023
Posts: 8

#15

09 Oct 2023, 06:42

Hello,

Here's another attempt. Thanks for your patience!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(info pref) float generous
1 1 0
2 0 0
1 0 1
2 0 1
1 1 0
2 1 0
1 0 0
2 0 0
1 1 0
2 0 0
1 1 1
2 0 1
1 1 0
2 1 0
1 1 0
2 0 0
1 0 0
2 0 0
1 1 0
2 0 0
1 1 0
2 0 0
1 1 0
2 0 0
1 1 1
2 0 1
1 1 0
2 0 0
1 1 0
2 0 0
1 1 0
2 1 0
1 0 0
2 1 0
1 1 0
2 0 0
1 1 0
2 1 0
1 0 0
2 0 0
1 1 0
2 0 0
1 1 0
2 1 0
1 1 0
2 0 0
1 1 0
2 0 0
1 1 0
2 0 0
1 0 1
2 0 1
1 1 0
2 1 0
1 1 0
2 0 0
1 1 0
2 0 0
1 1 0
2 0 0
1 1 0
2 0 0
1 0 0
2 0 0
1 1 1
2 1 1
1 1 0
2 0 0
1 1 0
2 1 0
1 1 0
2 0 0
1 1 0
2 0 0
1 1 0
2 0 0
1 0 0
2 0 0
1 1 0
2 0 0
1 1 1
2 0 1
1 1 0
2 0 0
1 0 1
end
label values info info_label
label def info_label 1 "Informed Dictator", modify
label def info_label 2 "Informed Recipient", modify
label values pref pref_label
label def pref_label 0 "No", modify
label def pref_label 1 "Yes", modify

Over	Proportion	Std. Error	90% CI lower	90% CI upper
_prop_1
var1.1	0.46666017	0.0110185	0.4485274	0.484764
var1.2	0.5220399	0.0070236	0.5104759	0.53335803
...	...	...	...	...

Announcement

Problems Introducing CI in Bar Chart

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment