Create histogram of continuous data with two bins to catch outliers on the tails

josh scott

Join Date: Apr 2016

Posts: 30
#1

Create histogram of continuous data with two bins to catch outliers on the tails

20 Apr 2023, 07:29

I would like to create a histogram that charts the continuous values of a particular variable. Most of the observations range from 2 to 5. I would like to have the main bins with a width of 1 and range from 0 to 7. However, I would also like two bookend bins that cover 0 to min(variable) and 7 to max(variable). When I try to do this, I always get errors with the way I use bins, or if I try to use the discrete() or cut() options with addplot().

Here is a data example of the variable:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float logA 3.595926 1.4829882 2.418526 2.232657 2.2168064 9.851233 1.9928473 2.0715392 2.878051 3.741612 2.5099146 3.546768 2.815486 1.762309 2.36012 30.12 1.93512 3.0619204 end

How can I achieve this in an effective manner?

Thank you in advance!
Tags: bins, continuous, histogram
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#2

20 Apr 2023, 08:19

Maybe you are looking for eqprhistogram by Nick Cox? To get it type in Stata ssc install eqprhistogram. Applying it to your example, I got:

Code:

eqprhistogram logA, bin(5)

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

20 Apr 2023, 09:25

This is an intriguing example, although with an anonymous name, no context, and quite possibly only a tiny subsample it would be easy to guess wrong.

Something called logA is quite possibly the logarithm of A and if there are high outliers on a log scale they must be massively hign outliers on the original scale.

I note further only that if you use not logarithms but reciprocals the transformed data are close to a uniform distribution.
Comment
josh scott

Join Date: Apr 2016

Posts: 30
#4

20 Apr 2023, 10:25

Thank you Maarten Buis! I will take a look at this.

Nick, this is a great point. I actually do have the reciprocal as well, but I wanted to look at both. 95% of the observations would lie within the seven bins with a width of 1 and only 5% above that. eqprhistogram is a great option, but am also curious about how to make this upper bin. For my preliminary idea of this histogram, I just wanted to show how many observations fall in each bin, and show the 5% above 7, but then further dig in to the range of the outliers elsewhere if necessary.
Comment

Announcement

Create histogram of continuous data with two bins to catch outliers on the tails

Comment

Comment

Comment