Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata histogram Help

    Hi,
    I'm attempting to find the 80th percentile of sales transaction totals in stata and graph it. I'd like the top of the first bar end at the 80% mark. Is there a way to do this in stata?

    As an aside: would this even be the best way to represent sale transaction data? Sales range from pennies to 10K. I'm way too far over my skis with this one, so any suggestions would be appreciated


    Current Command:
    histogram nrr_subtotal, bin(100) percent
    (bin=100, start=0, width=105.381)



    Click image for larger version

Name:	gragre.JPG
Views:	1
Size:	37.8 KB
ID:	1652206

  • #2
    Cross-posted at https://www.reddit.com/r/stata/comme...ta_histograph/ where there is already good advice (which doesn't include the detail that histogram is a standard term, while histograph isn't).

    Please note our request at https://www.statalist.org/forums/help#crossposting that you tell us about cross-posting, so that people interested in answering -- and people interested in reading an answer -- can all see what has been said already.

    The context here is of your level and -- perhaps even more important -- of what lies downstream, anything from an assignment to presentation to clients, and what technical level your audience expects, or can stretch to.

    The histogram is essentially a propaganda graph: Look, most sales are very small, but there is a tail of much higher sales! If that is your main message, well and good.

    I would move immediately to logarithmic scale, and a scale that shows percentiles directly. The pattern here is very simple because I set it up that way. Your own data would be messier, and perhaps more interesting.

    Code:
    clear
    set scheme s1color
    
    * fake data
    set obs 1000
    range y  -2 4
    gen sandbox = 10^y
    
    * histogram: clearly options are possible
    histogram sandbox , name(G1, replace)
    
    
    _pctile sandbox, p(80)
    local toshow : display %4.0f r(r1)
    quantile sandbox , ysc(log) yla(0.01 0.1 1 10 100 1000 10000 `toshow', ang(h)) yline(`toshow') xline(.8) rlopts(lc(none)) xla(0 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80" 1 "100") xtitle(cumulative percent) name(G2, replace)
    
    
    graph combine G1 G2
    Click image for larger version

Name:	sales_log.png
Views:	1
Size:	22.1 KB
ID:	1652225


    Last edited by Nick Cox; 27 Feb 2022, 03:43.

    Comment


    • #3
      Just possibly, you're echoing indirectly rules of thumb such as that 20% of the people contribute (consume, produce) 80% of the sales (beer, work, whatever), often associated more or less loosely with names such as Pareto or Lorenz. If so Lorenz curve is a search term.

      Comment

      Working...
      X