Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bin widths are not constant in histogram

    Dear StataList,

    I have a question regarding bin width in histograms.

    I cannot manage to create bins of equal width. It seems that Stata automatically adjusts bin widths, even when I explicitly set the width.

    This can be seen in the code below (which uses the classic auto data in case one of you is kind enough to try):


    clear all
    sysuse auto
    summarize price
    twoway__histogram_gen price ///
    , ///
    percent width(300) start(3000) ///
    gen (frequency bound)

    keep bound frequency
    order bound frequency

    The "bound" variable is the midpoint of the bins, which is not increasing by a constant increment of 300.

    Would you know what I am missing here?

    My goal is to export the constant bins and corresponding percents to Excel, which is why I use "twoway histogram".

    Many thanks for your consideration, and stay healthy.

    Best regards,

    Marc


  • #2
    The bin widths are correct. The command produces a row for all non-empty bins. As you mention, the variable you call 'bound' is the center of the bins, but only non-empty bins. You'll notice the distances between the values of 'bound' are all multiples of your bin width.

    Comment


    • #3
      Dear Mr Morris,
      Thank you very much for your accurate answer!
      This is very helpful. I did not think about that.
      I now fill in the empty bins by merging the generated non-empty bins to a linear range. That works well.
      Thank you for your time!

      Comment


      • #4
        I write the obtained solution to show the empty bins below, in case this is useful to someone else going forward:

        clear all
        sysuse auto
        summarize price
        twoway__histogram_gen price ///
        , ///
        percent width(500) start(3000) ///
        gen (frequency bound)

        keep bound frequency
        order bound frequency

        * Define the bounds as the lower bounds of the bins
        replace bound = bound - 250
        drop if frequency == .
        save hist_temp, replace
        clear all

        * Create a range with a number of bins of 27 = (16000-3000)/500 + 1
        range bound 3000 16000 27

        merge m:1 bound using hist_temp
        replace frequency = 0 if frequency == .
        drop _merge


        Comment

        Working...
        X