Limiting the range of data displayed in a graph

Oscar Weinzettl

Join Date: Nov 2018
Posts: 70

Limiting the range of data displayed in a graph

04 Jan 2020, 05:24

Good day,

I want to graph inheritances received in a histogram. Due to certain observations being very large, I want to limit my graph on the x-axis between 0 and 1,000,000 Euros. However I am having issues achieving this.

Hist 1 is how I would like the graph to look like. To achieve this I used the code

Code:

 *histogram gift_total  if gift_total < 1000000 & gift_total > 0, bin(20) percent addlabel ylabel(, angle(horizontal)) xtitle(Gifts) title(Histogram of Gifts Received)

(Gift_total is just the sum of all inheritances and gifts received. I excluded all the 0's to look at individuals who actually received some sort of bequest.)

However this is wrong as it excludes 43 larger observations.

Hist 2 includes all the observations, however the x-axis gets funky. For this I used the code

Code:

histogram gift_total if gift_total > 0 , xscale(range(0 1000000)) width(50000) percent addlabel ylabel(, angle(horizontal)) xtitle(Inheritance)

But the xscale doesn't seem to work very well.

What could I do to get the scale correct and include all observations correctly?

ID is the identification number, and mi_m is a multiple imputation of the original set (mi_m = 0)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id _mi_m) float gift_total
 27 0      0
 27 1      0
 27 2      0
 27 3      0
 27 4      0
 27 5      0
 36 0 570000
 36 0      0
 36 1 570000
 36 1      0
 36 2 570000
 36 2      0
 36 3 570000
 36 3      0
 36 4 570000
 36 4      0
 36 5 570000
 36 5      0
 67 0  40000
 67 0  20000
 67 1  40000
 67 1  20000
 67 2  40000
 67 2  20000
 67 3  40000
 67 3  20000
 67 4  40000
 67 4  20000
 67 5  40000
 67 5  20000
 86 0 100000
 86 0      0
 86 1 100000
 86 1      0
 86 2 100000
 86 2      0
 86 3 100000
 86 3      0
 86 4 100000
 86 4      0
 86 5 100000
 86 5      0
 92 0      0
 92 0      0
 92 1      0
 92 1      0
 92 2      0
 92 2      0
 92 3      0
 92 3      0
 92 4      0
 92 4      0
 92 5      0
 92 5      0
128 0      0
128 0      0
128 1      0
128 1      0
128 2      0
128 2      0
128 3      0
128 3      0
128 4      0
128 4      0
128 5      0
128 5      0
130 0 160000
130 1 160000
130 2 160000
130 3 160000
130 4 160000
130 5 160000
178 0  30000
178 1  30000
178 2  30000
178 3  30000
178 4  30000
178 5  30000
303 0      0
303 1      0
303 2      0
303 3      0
303 4      0
303 5      0
455 0      0
455 1      0
455 2      0
455 3      0
455 4      0
455 5      0
484 0  75000
484 0      0
484 1  75000
484 1      0
484 2  75000
484 2      0
484 3  75000
484 3      0
484 4  75000
484 4      0
end

Attached Files

Tags: None

Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#2

04 Jan 2020, 08:12

Also, with the same data I would like to create a boxplot that follows the same criteria (just to show the high amount of outliers). However, limiting a boxplot also kind of ruins the graph a tad. What can I do there?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35444
#3

05 Jan 2020, 04:27

I don't understand this at all.

Note that xscale(, range()) or yscale(, range()) will do nothing to omit data. This is clearly documented in the help for axis scale options:

range() never narrows the scale of an axis or causes data to be omitted from the plot. If you wanted to graph yvar versus xvar for the subset of xvar values between 10 and 50, typing

. scatter yvar xvar, xsc(r(10 50))

would not suffice. You need to type

. scatter yvar xvar if xvar>=10 & xvar<=50

Conversely when you do use if the complaint is just that

However this is wrong as it excludes 43 larger observations.

Stata did what you asked, it seems, so what in that is wrong?

Last edited by Nick Cox; 05 Jan 2020, 04:33.
Comment
Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#4

06 Jan 2020, 08:20

Sorry, I guess I worded this poorly.

I want to know how I can get a graph to look as the first histogram. Ths is clearly presentable. However, it is not right as it excludes 43 observations. The second graph is the true distribution, however it is not in anyway presentable. So I am wondering how I can get a graph that includes all observations when calculating the distribution, but only gives me an axis between 0 and 1,000,000. So observation larger than 1,000,000 are excluded graphically but not in the calculation for the distribution.

Last edited by Oscar Weinzettl; 06 Jan 2020, 08:22.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35444
#5

06 Jan 2020, 09:02

First off, I think the ideal is misguided. Logarithmic scale is natural for data like yours and insisting on a linear scale will produce a fairly horrible graph however you do it. Logarithmic scale will also pull in those outliers, so you will solve two problems at once. It is possible, however, that you are aiming at a naive readership that doesn't understand logarithmic scale.

Given your desire, there are at least two ways to proceed, One is to clone the variable but with values >= 1 million scaled to 1025000 say so that they fall into a single higher bin, which naturally should be explained clearly, The other is to use twoway__histogram_gen as described in detail in https://www.stata-journal.com/articl...article=gr0014
Comment

Announcement

Limiting the range of data displayed in a graph

Comment

Comment

Comment

Comment