Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional coloring on a histogram

    Hi,

    I would like to have a histogram when the coloring is, for example, red if the values are higher than a constant C. I found an old answer to the same question with a solution using frequencies:

    https://www.stata.com/statalist/arch.../msg00922.html

    Any pointer to do it using fraction instead of freq?

    Thanks.

  • #2
    Fractions calculated with respect to which total(s) is a fair question, I think.

    Please give a simple data example to show what you want.

    Comment


    • #3
      Thanks for your reply.

      I mean fraction in the common use when added as an option for histogram, that is, as a fraction of the total number of values in the variable that is used for the histogram.

      Here is a sample of the dataset:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float var
        .11685824
       -.13346104
       -.06194125
        .08109834
        .08109834
        .15261814
       .009578544
       -.09770115
       -.06194125
       -.13346104
       .009578544
        .04533844
        .08109834
       -.13346104
       -.06194125
        .15261814
       -.20498085
       .009578544
        .08109834
       -.16922094
      -.026181353
        .08109834
        .08109834
       -.13346104
       -.09770115
       -.16922094
       -.09770115
       -.06194125
      -.026181353
      -.026181353
       -.24074075
        .04533844
      -.026181353
       -.13346104
        .04533844
       -.09770115
       .009578544
      -.026181353
       -.09770115
       .009578544
       .009578544
        .04533844
       .009578544
       -.06194125
       .009578544
      -.026181353
       -.06194125
        .18837804
       -.06194125
        .08109834
      -.026181353
       -.06194125
        .08109834
      -.026181353
       .009578544
       -.09770115
      -.026181353
       -.09770115
        .08109834
       -.06194125
      -.026181353
       -.09770115
       .009578544
        .08109834
        .18837804
       .009578544
        .04533844
        .04533844
       .009578544
       .009578544
       .009578544
       .009578544
       .009578544
       .009578544
        .15261814
       -.24074075
       .009578544
        .15261814
        .11685824
        .18837804
       .009578544
        .11685824
       -.24074075
        .08109834
       -.06194125
        .08109834
        .04533844
      -.026181353
       -.06194125
      -.026181353
       .009578544
        .04533844
      -.026181353
       .009578544
        .11685824
      -.026181353
      -.026181353
      -.026181353
       -.20498085
       -.09770115
      end
      I would like to take this histogram:

      Code:
      graph twoway (hist var, fraction discrete fcolor(none) lcolor(black))
      histExample.png





      But add conditional coloring, for example, if the value of var is higher than 0.05, color the bars red. I can do that if I use the option freq instead of fraction:

      Code:
      graph twoway (hist var, freq discrete fcolor(none) lcolor(black)) (hist var if var > 0.05, freq discrete fcolor(red) lcolor(black))
      histExample2.png





      But in my application it is harder to understand this histogram (relative to others) when using frequencies instead of fractions. When I use fractions, the fraction is done with respect to the total number of values in var for which the if condition is true.


      Would appreciate any pointer on how to do it! Thank you.

      Comment


      • #4
        Thanks for that detail.

        You know how to superimpose two (disjoint) histograms based on frequencies. But if you change the syntax to fractions, the fractions will be determined separately for each subset, not what you want.

        The trick is therefore just to change the y axis title and axis labels so that they show what you want, because apart from those the graph you want is exactly the same.

        With your data example there are 100 observations with non-missing values and with reasonable bins for that sample size a histogram shows 0 10 20 30 as labels on the vertical axis, so we should want to see 0 0.1 0.2 0.3 instead.

        For messier cases there is a helper command on SSC called mylabels. In this example I don't use the local macro directly but copy and paste the result and edit to get the leading zeros I like.


        Code:
        set scheme s1color
        
        * get the frequency histogram
        twoway histogram var if var <= 0.05, freq color(none) start(-0.25) width(0.05) ///
        || histogram var if var > 0.05, freq start(-0.25) width(0.05) blcolor(red) bfcolor(red*0.5) ///
        legend(order(2 ">0.05") ring(0) pos(1))
        
        count if !missing(var)
        
        * r(N) is available for use immediately
         
        * ssc install mylabels
        mylabels 0(0.1)0.3, myscale(`r(N)'*@) local(yla)
        0 "0" 10 ".1" 20 ".2" 30 ".3"
        
        twoway histogram var if var <= 0.05, freq color(none) start(-0.25) width(0.05) ///
        || histogram var if var > 0.05, freq start(-0.25) width(0.05) blcolor(red) bfcolor(red*0.5) ///
        legend(order(2 ">0.05") ring(0) pos(1)) yla(0 "0" 10 "0.1" 20 "0.2" 30 "0.3", ang(h)) ytitle(Fraction)
        Click image for larger version

Name:	histogram2.png
Views:	1
Size:	19.6 KB
ID:	1485943



        For your purposes I would have extra x axis labels, especially at 0.05

        Note: You don't have to use scheme s2color, the default.
        Last edited by Nick Cox; 01 Mar 2019, 00:51.

        Comment


        • #5
          Dear Nick, I run your code and find the following message:
          Code:
          . set scheme s1color
          
          . 
          . * get the frequency histogram
          . twoway histogram var if var <= 0.05, freq color(none) start(-0.25) width(0.05) ///
          > || histogram var if var > 0.05, freq start(-0.25) width(0.05) blcolor(red) bfcolor(red*0.5) ///
          > legend(order(2 ">0.05") ring(0) pos(1))
          
          . 
          . count if !missing(var)
            100
          
          . 
          . * r(N) is available for use immediately
          .  
          . * ssc install mylabels
          . mylabels 0(0.1)0.3, myscale(`r(N)'*@) local(yla) 
          0 "0" 10 ".1" 20 ".2" 30 ".3"
          
          . 0 "0" 10 ".1" 20 ".2" 30 ".3"
          0 is not a valid command name
          r(199);
          
          end of do-file
          
          r(199);
          Any suggestions?
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            That’s output sprinkled among the code.

            Comment


            • #7
              That's a clever solution, thanks a lot Nick!

              Comment


              • #8
                Dear Nick, Thanks.
                Ho-Chuan (River) Huang
                Stata 19.0, MP(4)

                Comment

                Working...
                X