Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graph histogram by specific variable with twoway__histogram_gen

    Dear Statalist users,

    I want to analyze the distribution of my variable y by using a histogram. However, as you can see in the attached file "histogram_by_year", it is hardly readable because of a few very strong outliers. First, I have tried to use the following specification to zoom in.

    Code:
    histogram y if y < 10000, bin(40) by (year)
    However, this changes the distribution as and is, consequently, not an ideal solution. Next, I found out about twoway__histogram_gen command. I've used the following code:

    Code:
    twoway__histogram_gen y, width(250) gen(h x)
    twoway bar h x if inrange(x,0,10000), barwidth(250) bstyle(histogram) by(year)
    I specified the width to have 40 bins in each of the graphs, but as you can see in the attached file "histogram_by_year_zoom" the bins get equally divided. How do I have to adjust my code in order to have 40 bins in each of these graphs, while leaving the barwidth the same?


    Thank you very much for any help.

    Best regards,
    Ali
    Attached Files

  • #2
    Maybe you wish to fiddle with some options for the - histogram - command, such as bin (#), width(#) and binrescale (this one, when use - by -).

    That being said, among the critics related to histograms, we shall mention this one: the aspect may change - sometimes, amazingly, if not misleadingly - under a suite of manipulations.
    Best regards,

    Marcos

    Comment


    • #3
      As you give no data example or summaries it's hard to know whether your minima are ever zero or negative. But I sense that your variable is something like income.

      In general, logarithmic scale is the way forward. Almost nothing else will really help. Same applies to density estimation.
      twoway__histogram_gen is really just using the same concepts as histogram. (If you do have zeros or negative values, there are tricks to help.) Lengthy discussion at https://www.stata-journal.com/article.html?article=gr0072

      Comment


      • #4
        Thank you Marcos and Nick for your answers and suggestions. I will try them out later today. Unfortunately, I think that I am not allowed to share the data. However, I can say that no values is either negative or zero.

        Best regards,
        Ali

        Comment


        • #5
          The FAQ Advice already addresses not being allowed to share the data:

          If your dataset is confidential, then provide a fake example instead.

          Comment

          Working...
          X