Bar chart with data (x-axis) separated by percentiles

Frieder Neunhoeffer

Join Date: Mar 2022

Posts: 11
#1

Bar chart with data (x-axis) separated by percentiles

08 Mar 2022, 09:59

Hi all,

My dataset consists of a binary variable (risky_choice) and a continuous variable (estimated_probability). I would like to create a bar chart that shows the share of risky choice (mean of risky_choice) on the y-axis depending on the level of the estimated_probability which I would like to organize in decimals. I.e., I would like to have 10 bars each corresponding to 10% of the data in one graph. I'd be grateful for any ideas. Thanks
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35213

08 Mar 2022, 10:31

No data example here but sounds something like

Code:

gen bin = floor(estimated_probability * 10) 

forval p = 0/9 { 
      label def bin `j' "0.`p'- ", modify 
} 
label val bin bin 


graph bar (mean) risky_choice, over(bin)

Comment

Frieder Neunhoeffer

Join Date: Mar 2022

Posts: 11
#3

08 Mar 2022, 11:34

Code:

. forval p = 0/9 { 2. . label def bin `j' "0.`p'- ", modify 3. . } invalid syntax r(198);

Unfortunately, I'll get an error message. Estimated_probability is actually a discrete variable that can take values between and including 1 and 1000. Can that be the cause?

Last edited by Frieder Neunhoeffer; 08 Mar 2022, 11:42.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#4

08 Mar 2022, 11:41

No; it’s not down to your using Stata 14.2. It’s my error, an inconsistency in mixing j and p. Should be one of those, but not both.
Comment
Frieder Neunhoeffer

Join Date: Mar 2022

Posts: 11
#5

08 Mar 2022, 11:52

Thanks, Nick, that solved the error message. However, the code does not seem to affect my graph. I still get bars for all existing levels of the x-value (estimated_probability).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#6

08 Mar 2022, 12:15

Back to #2: Please give a data example.
Comment

Frieder Neunhoeffer

Join Date: Mar 2022
Posts: 11

09 Mar 2022, 02:44

Let's say I would like to use the following example:

Code:

sysuse auto

gen bin = floor(price * 10)

forval p = 0/9 {
     label def bin `p' "0.`p'- ", modify
     }

label val bin bin

graph bar (mean) mpg, over(bin)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35213

09 Mar 2022, 04:14

That example doesn't make much sense to me, as

1. No values of 10 times price round down to 0(1)9.

2. You want the mean of a binary variable in bins of probability.

In the absence of real data, let's lean on foreign in the auto data being binary. and simulate our own probability variable.

This is a self-contained script you can run. Although the first graph is more like my first idea, the second graph may be more helpful.

Code:

sysuse auto, clear
set seed 12345

gen prob = runiform()

gen bin = floor(prob * 10)

forval p = 0/9 {
     label def bin `p' "0.`p'- ", modify
     }

label val bin bin

set scheme s1color

graph bar (mean) foreign, over(bin) name(G1, replace)

egen mean = mean(foreign), by(bin)

egen count = count(foreign), by(bin)

gen toshow = bin/10 + 0.05

tabdisp bin, c(count mean)

twoway bar mean toshow, xla(0 "0" 1 "1" 0.1(0.1)0.9, format(%02.1f)) barw(0.1) xtitle(probability) bfcolor(none) name(G2, replace)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35213
#9

09 Mar 2022, 09:07

[Sorry; posted twice]
Comment
Frieder Neunhoeffer

Join Date: Mar 2022

Posts: 11
#10

09 Mar 2022, 12:21

Thanks, Nick for your help. I see our misunderstanding. What I am looking for are "percentiles" in the sense of equal bin size. In this example, this would refer to deciles. That is to say, when executing the

Code:

tabdisp bin, c(count mean)

command all bins would have the same count. In this example 7 or 8 because the sample size of the auto data set (74) is not a multiple of 10.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35213

#11

09 Mar 2022, 12:38

That's fine by me. Use xtile to bin to decile bins.

Code:

sysuse auto, clear
set seed 12345

gen prob = runiform()

xtile bin=prob, nq(10) 

set scheme s1color

egen mean = mean(foreign), by(bin)
egen min = min(prob), by(bin)
egen max = max(prob), by(bin)
egen count = count(foreign), by(bin)

tabdisp bin, c(count min max mean)

twoway bar mean bin, base(0) xla(1/10) barw(0.9) xtitle(probability decile bins) bfcolor(none) name(G2, replace)

Comment

Frieder Neunhoeffer

Join Date: Mar 2022

Posts: 11
#12

10 Mar 2022, 02:59

Thank you, Nick. That solved it.
Comment

Announcement