Percentile ranking

Eugene Lacoste

Join Date: Jul 2020

Posts: 24
#1

Percentile ranking

19 Aug 2020, 08:23

Dear all,

I got the following code because I need to calculate the time-series percentile rankings (a very confusing term for me given that I did not find a very clear definition) for my variable ("var") for each id.
I am kind of unsure if it is correct or not...

To explain in details what I did:

* I calculated the rank of my variable per year (so each id will receive a ranking based on the total number of observations per year (excluding therefore missing values)
* Following that, since I have a panel data from 1995 to 2000, I calculated the average of the rankings that I got for each year (let's say id_3000 had ranking: 1, 10, 15, 18, 19 and 20, the average would be then 13,83)
* After that, each firm has an average ranking for the 6 years and I calculate the median of these average rankings.
* If the average ranking is above the computed median, I would mark the id as a "1" and if not, it is a "0".

Code:

sort id bysort year: egen rank = rank(var), track bysort year: egen n = count(var) gen pcrank = ((rank-1)/(n-1)) egen mean = mean(pcrank), by(id) egen median_var = median(mean) gen var_pcrank = 1 if mean > median_var replace var_pcrank = 0 if mean <= median_var replace var_pcrank = . if mean == .

Thank you very much if you can help me,
Best regards,
Eugene
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35432
#2

19 Aug 2020, 09:29

I don't understand what the question is here.

The idea of a percentile rank is that some percent of values are smaller than a given value -- say, your test score or a child's height or weight -- and so the complementary percent are larger. That seems clear enough to me except that it leaves a lot of small print about exactly how it's done -- is that smaller or equal to, etc.? what about ties? -- but the goal of https://www.stata.com/support/faqs/s...ing-positions/ was to cover the most common cases, and I am always open to comments about important detail that may have been omitted. It seems that you are drawing upon that FAQ, or unwittingly on some source that does.

I will add some minor comments on your code, although so far as I can see it should do more or less what you want.

Despite the name percentile your ranks are scaled to lie in [0, 1].

Indeed the definition [rank - 1] / [count - 1] is one I dislike for reasons epitomised by the results of invnormal(0)or invnormal(1).

I wouldn't use the track option myself. The default of rank() is more nearly standard and treats ties symmetrically.

Your code can be slimmed down to

Code:

bysort year: egen rank = rank(var) by year: egen n = count(var) gen pcrank = (rank-1)/(n-1) egen mean = mean(pcrank), by(id) su mean, detail scalar median = r(p50) gen var_pcrank = mean > median if mean < .
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1425
#3

19 Aug 2020, 14:41

Have a look at "fractional rank" -fracrank- program by Philippe Van Kerm: it's part of his -sgini- module on SSC
Comment

Announcement

Percentile ranking

Comment

Comment