  • Bar chart with data (x-axis) separated by percentiles

    Hi all,

    My dataset consists of a binary variable (risky_choice) and a continuous variable (estimated_probability). I would like to create a bar chart that shows the share of risky choice (mean of risky_choice) on the y-axis depending on the level of the estimated_probability which I would like to organize in decimals. I.e., I would like to have 10 bars each corresponding to 10% of the data in one graph. I'd be grateful for any ideas. Thanks

    No data example here but sounds something like

    gen bin = floor(estimated_probability * 10) 
    forval p = 0/9 { 
          label def bin `j' "0.`p'- ", modify 
    label val bin bin 
    graph bar (mean) risky_choice, over(bin)


      . forval p = 0/9 {
      .       label def bin `j' "0.`p'- ", modify
      . }
      invalid syntax
      Unfortunately, I'll get an error message. Estimated_probability is actually a discrete variable that can take values between and including 1 and 1000. Can that be the cause?
        No; it’s not down to your using Stata 14.2. It’s my error, an inconsistency in mixing j and p. Should be one of those, but not both.


          Thanks, Nick, that solved the error message. However, the code does not seem to affect my graph. I still get bars for all existing levels of the x-value (estimated_probability).


            Back to #2: Please give a data example.


              Let's say I would like to use the following example:

              sysuse auto
              gen bin = floor(price * 10)
              forval p = 0/9 {
                   label def bin `p' "0.`p'- ", modify
              label val bin bin
              graph bar (mean) mpg, over(bin)


                That example doesn't make much sense to me, as

                1. No values of 10 times price round down to 0(1)9.

                2. You want the mean of a binary variable in bins of probability.

                In the absence of real data, let's lean on
                foreign in the auto data being binary. and simulate our own probability variable.

                This is a self-contained script you can run. Although the first graph is more like my first idea, the second graph may be more helpful.

                sysuse auto, clear
                set seed 12345
                gen prob = runiform()
                gen bin = floor(prob * 10)
                forval p = 0/9 {
                     label def bin `p' "0.`p'- ", modify
                label val bin bin
                set scheme s1color
                graph bar (mean) foreign, over(bin) name(G1, replace)
                egen mean = mean(foreign), by(bin)
                egen count = count(foreign), by(bin)
                gen toshow = bin/10 + 0.05
                tabdisp bin, c(count mean)
                twoway bar mean toshow, xla(0 "0" 1 "1" 0.1(0.1)0.9, format(%02.1f)) barw(0.1) xtitle(probability) bfcolor(none) name(G2, replace)


                    Thanks, Nick for your help. I see our misunderstanding. What I am looking for are "percentiles" in the sense of equal bin size. In this example, this would refer to deciles. That is to say, when executing the
                    tabdisp bin, c(count mean)
                    command all bins would have the same count. In this example 7 or 8 because the sample size of the auto data set (74) is not a multiple of 10.


                      That's fine by me. Use xtile to bin to decile bins.

                      sysuse auto, clear
                      set seed 12345
                      gen prob = runiform()
                      xtile bin=prob, nq(10) 
                      set scheme s1color
                      egen mean = mean(foreign), by(bin)
                      egen min = min(prob), by(bin)
                      egen max = max(prob), by(bin)
                      egen count = count(foreign), by(bin)
                      tabdisp bin, c(count min max mean)
                      twoway bar mean bin, base(0) xla(1/10) barw(0.9) xtitle(probability decile bins) bfcolor(none) name(G2, replace)


                        Thank you, Nick. That solved it.

