Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Limiting the range of data displayed in a graph

    Good day,

    I want to graph inheritances received in a histogram. Due to certain observations being very large, I want to limit my graph on the x-axis between 0 and 1,000,000 Euros. However I am having issues achieving this.

    Hist 1 is how I would like the graph to look like. To achieve this I used the code




    Code:
     *histogram gift_total  if gift_total < 1000000 & gift_total > 0, bin(20) percent addlabel ylabel(, angle(horizontal)) xtitle(Gifts) title(Histogram of Gifts Received)
    (Gift_total is just the sum of all inheritances and gifts received. I excluded all the 0's to look at individuals who actually received some sort of bequest.)

    However this is wrong as it excludes 43 larger observations.

    Hist 2 includes all the observations, however the x-axis gets funky. For this I used the code

    Code:
    histogram gift_total if gift_total > 0 , xscale(range(0 1000000)) width(50000) percent addlabel ylabel(, angle(horizontal)) xtitle(Inheritance)
    But the xscale doesn't seem to work very well.

    What could I do to get the scale correct and include all observations correctly?

    ID is the identification number, and mi_m is a multiple imputation of the original set (mi_m = 0)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id _mi_m) float gift_total
     27 0      0
     27 1      0
     27 2      0
     27 3      0
     27 4      0
     27 5      0
     36 0 570000
     36 0      0
     36 1 570000
     36 1      0
     36 2 570000
     36 2      0
     36 3 570000
     36 3      0
     36 4 570000
     36 4      0
     36 5 570000
     36 5      0
     67 0  40000
     67 0  20000
     67 1  40000
     67 1  20000
     67 2  40000
     67 2  20000
     67 3  40000
     67 3  20000
     67 4  40000
     67 4  20000
     67 5  40000
     67 5  20000
     86 0 100000
     86 0      0
     86 1 100000
     86 1      0
     86 2 100000
     86 2      0
     86 3 100000
     86 3      0
     86 4 100000
     86 4      0
     86 5 100000
     86 5      0
     92 0      0
     92 0      0
     92 1      0
     92 1      0
     92 2      0
     92 2      0
     92 3      0
     92 3      0
     92 4      0
     92 4      0
     92 5      0
     92 5      0
    128 0      0
    128 0      0
    128 1      0
    128 1      0
    128 2      0
    128 2      0
    128 3      0
    128 3      0
    128 4      0
    128 4      0
    128 5      0
    128 5      0
    130 0 160000
    130 1 160000
    130 2 160000
    130 3 160000
    130 4 160000
    130 5 160000
    178 0  30000
    178 1  30000
    178 2  30000
    178 3  30000
    178 4  30000
    178 5  30000
    303 0      0
    303 1      0
    303 2      0
    303 3      0
    303 4      0
    303 5      0
    455 0      0
    455 1      0
    455 2      0
    455 3      0
    455 4      0
    455 5      0
    484 0  75000
    484 0      0
    484 1  75000
    484 1      0
    484 2  75000
    484 2      0
    484 3  75000
    484 3      0
    484 4  75000
    484 4      0
    end
    Attached Files

  • #2
    Also, with the same data I would like to create a boxplot that follows the same criteria (just to show the high amount of outliers). However, limiting a boxplot also kind of ruins the graph a tad. What can I do there?

    Comment


    • #3
      I don't understand this at all.

      Note that xscale(, range()) or yscale(, range()) will do nothing to omit data. This is clearly documented in the help for axis scale options:


      range() never narrows the scale of an axis or causes data to be omitted from the plot. If you wanted to graph yvar versus xvar for the subset of xvar values between 10 and 50, typing

      . scatter yvar xvar, xsc(r(10 50))

      would not suffice. You need to type

      . scatter yvar xvar if xvar>=10 & xvar<=50
      Conversely when you do use if the complaint is just that

      However this is wrong as it excludes 43 larger observations.
      Stata did what you asked, it seems, so what in that is wrong?

      Last edited by Nick Cox; 05 Jan 2020, 04:33.

      Comment


      • #4
        Sorry, I guess I worded this poorly.

        I want to know how I can get a graph to look as the first histogram. Ths is clearly presentable. However, it is not right as it excludes 43 observations. The second graph is the true distribution, however it is not in anyway presentable. So I am wondering how I can get a graph that includes all observations when calculating the distribution, but only gives me an axis between 0 and 1,000,000. So observation larger than 1,000,000 are excluded graphically but not in the calculation for the distribution.
        Last edited by Oscar Weinzettl; 06 Jan 2020, 08:22.

        Comment


        • #5
          First off, I think the ideal is misguided. Logarithmic scale is natural for data like yours and insisting on a linear scale will produce a fairly horrible graph however you do it. Logarithmic scale will also pull in those outliers, so you will solve two problems at once. It is possible, however, that you are aiming at a naive readership that doesn't understand logarithmic scale.

          Given your desire, there are at least two ways to proceed, One is to clone the variable but with values >= 1 million scaled to 1025000 say so that they fall into a single higher bin, which naturally should be explained clearly, The other is to use twoway__histogram_gen as described in detail in https://www.stata-journal.com/articl...article=gr0014

          Comment

          Working...
          X