Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accurate binning-scatter

    Hi all,

    I have some data on sales (in a variable called log_sales2) categorized in 10 categories according to:

    Code:
    egen log_salescat = cut(log_sales2), group(10) label
    I would like to form a graph showing if there is a relationship between the variables' categories sorted and another variable called mean_own First I tried a simple graph as follows: binning.pdf. But of course it looks naive. I would like to construct equally sized bins for each category of sales putting a labeling like 0-3, 3-5... for the categories. Further I would like to put a single point displaying the average of mean_own for each category, possibly connected.
    I am thinking I guess of something like:

    Code:
    twoway (histogram,...) (connected,...)
    but the problem is that I don't know how to force equally sized bins for histogram. Moreover, I know that this is a very hard and general question if you don't know the data, but I am just trying to have some general ideas: do you think that there is a more efficient way to show the relationship among variables?
    Basically log_salescat are sales categories and mean_own is an own elasticity of demand.

    Any help is appreciated,

    Federico

  • #2
    You want, I think, bins of equal frequency, but that is precisely what xtile and its many alternatives purport to produce. Bins labelled 0-3, 3-5, .... immediately raise a question about what exactly happens with values of 3.

    eqprhistogram (SSC) draws histograms with equal frequencies, to the extent possible, so in general unequal width. There are very good reasons, in my view, why they aren't seen more frequently:

    0. They struggle with ties, just like any binning method.

    1. They are often disappointing or weird-looking or both. This is a Catch-22 in that people see them rarely and so feel uncomfortable with them any way.

    2. They don't improve on other ways of showing distributions, such as kernel density estimates.

    Why do you need or want to categorise sales any way?



    Comment


    • #3
      I agree with you actually. It also seems to me that forcing equal frequency could lead to biases or misreadings.
      The point is that while log_sales is made by firm, the other variable mean_own is organized by molecule. Hence, since I have to show the relation between the two, I need either to work at molecule level or at firm level. My group is working at firm level. Firms, however are 500, hence, I cannot put all of them in xlabel without confusing the reader (I also tried to randomly sample by groups according to the percentiles of the variable sales, but the chief of the department wants every firm to be displayed). Thus, I had the idea to categorize firms' sales even if I do not personally like to discretize continuous variables

      Comment


      • #4
        Nick Cox by thee way I ended up with this: relation_own_firmsize.pdf. I guess it's a sort of compromise between clarity and graphical beauty. Hope this is clear enough for thee relation between the two variables

        Comment


        • #5
          Please post images as .png not .pdf. Please see 12.4 at https://www.statalist.org/forums/help#stata

          I don't know what is standard in your field and I don't fully understand what the graph shows, but

          1. The bins appear to be an attempt at decile bins, so their approximately equal frequency should be 0.1 as a fraction, which does not appear to be shown as such.

          2. I am always queasy when bars for bins that touch don't touch too.

          3. The basic idea seems to be average outcome given log sales, for which kind of problem in my field I would always prefer a scatter plot with some smooth summary -- either a regression of some kind or a scatter plot smoother.

          Comment


          • #6
            Thank you for all the suggestions. I will change accordingly

            Comment

            Working...
            X