Plot (kernel) density estimates as areas

Nick Cox

Join Date: Mar 2014

Posts: 35375
#1

Plot (kernel) density estimates as areas

17 Apr 2020, 10:02

This is a brief puff for an idea that has become standard in some quarters, but seems to deserve a bigger push until everyone who might care knows about it. Here is a reproducible example, which as always is indicative, not definitive.

Code:

sysuse auto, clear gen where = _n + 4 in 1/45 local choices kernel(biweight) bw(5) at(where) kdensity mpg if foreign, `choices' gen(x1 d1) kdensity mpg if !foreign, `choices' gen(x0 d0) gen rug1 = -0.004 gen rug0 = -0.008 twoway area d1 d0 where, xtitle("`: var label mpg'") color(orange%40 blue%40) /// || scatter rug1 mpg if foreign, ms(|) mc(orange) msize(medlarge) /// || scatter rug0 mpg if !foreign, ms(|) mc(blue) msize(medlarge) /// legend(order(1 "Foreign" 2 "Domestic") pos(1) ring(0) col(1)) /// ytitle(Probability density) yla(, ang(h)) xla(10(10)40)

Kernel density estimates are plotted by default in Stata as lines, meaning curves. It is elementary (meaning, fundamental) that area under the curve has an interpretation as probability.

Often area-based graphs say in a complicated way what could be said much more simply. Bad examples include bars with arbitrary bases that could just be replaced by point symbols for the values in question, or bars that start at zero, when not being zero is banal or irrelevant.

However, area graphs can be helpful when comparing two or more distributions. (Histograms work that way.) But then transparency becomes vital to see overlap clearly.

You can do something like this directly with kdensity or twoway density with the option recast(area). There is no special rationale for coding as above, although the default of truncating the density at the observed extremes can be unfortunate, so I typically work a little harder at setting up a wider grid on which to calculate estimates.

The immediate inspiration for this came from an excellent book by Claus Wilke. This is a link to a review I wrote with several detailed comments: https://www.amazon.com/gp/customer-reviews/R22MWD7RJ6QAFP
Tags: None

10 likes
Tiago Pereira

Join Date: Jan 2016

Posts: 374
#2

27 Aug 2021, 08:22

Thank you once more, Nick, for your incredible contributions. That code has been remarkably useful for our academic projects.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35375
#3

27 Aug 2021, 09:38

Thanks very much for #2. Anyone interested in this thread might find https://www.statalist.org/forums/for...lable-from-ssc interesting or even useful.
Comment
Phyu Zin

Join Date: Aug 2021

Posts: 26
#4

11 Oct 2021, 19:44

Hi Nick

Can you advise to subset the matched cohort after Kernel matching?

Thanks
Phyu
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35375
#5

12 Oct 2021, 02:57

Kernel matching is not something I know anything about. I'd advise a separate question.
Comment
oliver wei

Join Date: Dec 2022

Posts: 16
#6

11 Jun 2023, 08:03

extremely useful. Nice and easy to use. Will implement it in my paper. Thanks a lot!
Comment
Adam Dynes

Join Date: Sep 2014

Posts: 12
#7

18 Aug 2023, 20:13

Thank you!!!!!!
Comment

Announcement

Plot (kernel) density estimates as areas

Comment

Comment

Comment

Comment

Comment

Comment