Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create histogram of continuous data with two bins to catch outliers on the tails

    I would like to create a histogram that charts the continuous values of a particular variable. Most of the observations range from 2 to 5. I would like to have the main bins with a width of 1 and range from 0 to 7. However, I would also like two bookend bins that cover 0 to min(variable) and 7 to max(variable). When I try to do this, I always get errors with the way I use bins, or if I try to use the discrete() or cut() options with addplot().

    Here is a data example of the variable:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float logA
       3.595926
      1.4829882
       2.418526
       2.232657
      2.2168064
       9.851233
      1.9928473
      2.0715392
       2.878051
       3.741612
      2.5099146
       3.546768
       2.815486
       1.762309
        2.36012
     30.12
        1.93512
      3.0619204
    
    end

    How can I achieve this in an effective manner?

    Thank you in advance!

  • #2
    Maybe you are looking for eqprhistogram by Nick Cox? To get it type in Stata ssc install eqprhistogram. Applying it to your example, I got:

    Code:
    eqprhistogram logA, bin(5)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	18.1 KB
ID:	1710572
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      This is an intriguing example, although with an anonymous name, no context, and quite possibly only a tiny subsample it would be easy to guess wrong.

      Something called logA is quite possibly the logarithm of A and if there are high outliers on a log scale they must be massively hign outliers on the original scale.

      I note further only that if you use not logarithms but reciprocals the transformed data are close to a uniform distribution.

      Comment


      • #4
        Thank you Maarten Buis! I will take a look at this.

        Nick, this is a great point. I actually do have the reciprocal as well, but I wanted to look at both. 95% of the observations would lie within the seven bins with a width of 1 and only 5% above that. eqprhistogram is a great option, but am also curious about how to make this upper bin. For my preliminary idea of this histogram, I just wanted to show how many observations fall in each bin, and show the 5% above 7, but then further dig in to the range of the outliers elsewhere if necessary.

        Comment

        Working...
        X