Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar chart with data (x-axis) separated by percentiles

    Hi all,

    My dataset consists of a binary variable (risky_choice) and a continuous variable (estimated_probability). I would like to create a bar chart that shows the share of risky choice (mean of risky_choice) on the y-axis depending on the level of the estimated_probability which I would like to organize in decimals. I.e., I would like to have 10 bars each corresponding to 10% of the data in one graph. I'd be grateful for any ideas. Thanks

  • #2
    No data example here but sounds something like

    Code:
    gen bin = floor(estimated_probability * 10) 
    
    forval p = 0/9 { 
          label def bin `j' "0.`p'- ", modify 
    } 
    label val bin bin 
    
    
    graph bar (mean) risky_choice, over(bin)

    Comment


    • #3
      Code:
      . forval p = 0/9 {
        2.
      .       label def bin `j' "0.`p'- ", modify
        3.
      . }
      invalid syntax
      r(198);
      Unfortunately, I'll get an error message. Estimated_probability is actually a discrete variable that can take values between and including 1 and 1000. Can that be the cause?
      Last edited by Frieder Neunhoeffer; 08 Mar 2022, 11:42.

      Comment


      • #4
        No; it’s not down to your using Stata 14.2. It’s my error, an inconsistency in mixing j and p. Should be one of those, but not both.

        Comment


        • #5
          Thanks, Nick, that solved the error message. However, the code does not seem to affect my graph. I still get bars for all existing levels of the x-value (estimated_probability).

          Comment


          • #6
            Back to #2: Please give a data example.

            Comment


            • #7
              Let's say I would like to use the following example:

              Code:
              sysuse auto
              
              gen bin = floor(price * 10)
              
              forval p = 0/9 {
                   label def bin `p' "0.`p'- ", modify
                   }
              
              label val bin bin
              
              graph bar (mean) mpg, over(bin)

              Comment


              • #8
                That example doesn't make much sense to me, as

                1. No values of 10 times price round down to 0(1)9.

                2. You want the mean of a binary variable in bins of probability.

                In the absence of real data, let's lean on
                foreign in the auto data being binary. and simulate our own probability variable.

                This is a self-contained script you can run. Although the first graph is more like my first idea, the second graph may be more helpful.


                Code:
                sysuse auto, clear
                set seed 12345
                
                gen prob = runiform()
                
                gen bin = floor(prob * 10)
                
                forval p = 0/9 {
                     label def bin `p' "0.`p'- ", modify
                     }
                
                label val bin bin
                
                set scheme s1color
                
                graph bar (mean) foreign, over(bin) name(G1, replace)
                
                egen mean = mean(foreign), by(bin)
                
                egen count = count(foreign), by(bin)
                
                gen toshow = bin/10 + 0.05
                
                tabdisp bin, c(count mean)
                
                twoway bar mean toshow, xla(0 "0" 1 "1" 0.1(0.1)0.9, format(%02.1f)) barw(0.1) xtitle(probability) bfcolor(none) name(G2, replace)

                Comment


                • #9
                  [Sorry; posted twice]

                  Comment


                  • #10
                    Thanks, Nick for your help. I see our misunderstanding. What I am looking for are "percentiles" in the sense of equal bin size. In this example, this would refer to deciles. That is to say, when executing the
                    Code:
                    tabdisp bin, c(count mean)
                    command all bins would have the same count. In this example 7 or 8 because the sample size of the auto data set (74) is not a multiple of 10.

                    Comment


                    • #11
                      That's fine by me. Use xtile to bin to decile bins.


                      Code:
                      sysuse auto, clear
                      set seed 12345
                      
                      gen prob = runiform()
                      
                      xtile bin=prob, nq(10) 
                      
                      set scheme s1color
                      
                      egen mean = mean(foreign), by(bin)
                      egen min = min(prob), by(bin)
                      egen max = max(prob), by(bin)
                      egen count = count(foreign), by(bin)
                      
                      tabdisp bin, c(count min max mean)
                      
                      twoway bar mean bin, base(0) xla(1/10) barw(0.9) xtitle(probability decile bins) bfcolor(none) name(G2, replace)

                      Comment


                      • #12
                        Thank you, Nick. That solved it.

                        Comment

                        Working...
                        X