This may be widely known, but in case not I thought I would share...
Stata has several commands that compute percentiles:
centile
sum, d
_pctile
egen pctile
and perhaps others.
It turns out that these do not always yield the same results, apart from the median or 50th percentile. For example this code:
gives these results:
There is nothing surprising about this if one reads carefully the respective "Methods and Formulas" sections in each command's documentation, as centile uses a different formula than do the others.
Yet the differences may be nontrivial in some contexts (e.g. computation of IQRs), so it is perhaps worth considering which of the competing formulae squares most closely with how the researcher conceives of percentiles.
Stata has several commands that compute percentiles:
centile
sum, d
_pctile
egen pctile
and perhaps others.
It turns out that these do not always yield the same results, apart from the median or 50th percentile. For example this code:
Code:
preserve cap drop _all set obs 20 set seed 23 tempvar y gen `y'=exp(rnormal(0,1)) qui centile `y', c(10 25 50 75 90) di r(c_1) _n r(c_2) _n r(c_3) _n r(c_4) _n r(c_5) qui sum `y',d di r(p10) _n r(p25) _n r(p50) _n r(p75) _n r(p90) qui _pctile `y', p(10 25 50 75 90) di r(r1) _n r(r2) _n r(r3) _n r(r4) _n r(r5) drop _all restore
Code:
. preserve . cap drop _all . set obs 20 number of observations (_N) was 0, now 20 . set seed 23 . tempvar y . gen `y'=exp(rnormal(0,1)) . qui centile `y', c(10 25 50 75 90) . di r(c_1) _n r(c_2) _n r(c_3) _n r(c_4) _n r(c_5) .29993572 .38304436 1.6890243 2.8531529 5.1466236 . qui sum `y',d . di r(p10) _n r(p25) _n r(p50) _n r(p75) _n r(p90) .31345257 .40814352 1.6890243 2.7669318 5.0989532 . qui _pctile `y', p(10 25 50 75 90) . di r(r1) _n r(r2) _n r(r3) _n r(r4) _n r(r5) .31345257 .40814352 1.6890243 2.7669318 5.0989532 . drop _all . restore . end of do-file
Yet the differences may be nontrivial in some contexts (e.g. computation of IQRs), so it is perhaps worth considering which of the competing formulae squares most closely with how the researcher conceives of percentiles.
Comment