Finding variable percentile position / rank in row

Henry Cust

Join Date: Nov 2021

Posts: 18
#1

Finding variable percentile position / rank in row

02 Aug 2022, 00:23

Hi,

I wish to find the percentile position of a variable in a row of other variables.

E.g. I have 30 unordered vars that make up my distribution called var1, var2 etc. Then another variable not part of the distribution, let's say var_star, that I want the percentile rank or position of within var1-var30.

Any help would be greatly appreciated, I am sure it is straightforward, and I simply cannot think of the right question to ask.

Henry
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#2

02 Aug 2022, 01:57

It would have been easier for us if you included example data. Now we have to interpret what you said about what your data looked like. Interpreting is more work than necessary, and than implementing our interpretation by making an example dataset is more work than necessary. Instead you could have saved us that by including an example dataset with the dataex command, as is recommended in the Statalist FAQ. Moreover, if we interpret your text differently from how you intended it, then our answer will not fit your problem.

If I interpret your data correctly, then the answer is that everything is easier in long format:

Code:

// create some example data clear set obs 10 gen id = _n forvalues i = 1/30 { gen x`i' = rnormal() } // the solution reshape long x, i(id) j(t) egen xstar = rank(x), by(id)

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#3

02 Aug 2022, 02:10

Ranking across rows was treated in the Stata Journal in 2009. A sequel might as well be mentioned here.

SJ-20-2 pr0046_1 . . . . . . . . . . . Speaking Stata: More ways for rowwise
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/20 SJ 20(2):481--488 (no commands)
focuses on returning which variable or variables are equal
to the maximum or minimum in a row

SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
(help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
Q1/09 SJ 9(1):137--157
shows how to exploit functions, egen functions, and Mata
for working rowwise; rowsort and rowranks are introduced

Then it's a matter of applying your recipe for percentile rank, which is an FAQ:

FAQ . . . . . . . . . . Calculating percentile ranks or plotting positions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
2/14 How can I calculate percentile ranks?
How can I calculate plotting positions?
http://www.stata.com/support/faqs/st...centile-ranks-
and-plotting-positions/

Absent a data example, 5 variables will serve as well as 30 to show technique:

Code:

clear set obs 10 set seed 2803 forval j = 1/5 { gen var`j' = runiformint(0, 100) } list rowranks var? , gen(rank1-rank5) forval j = 1/5 { gen pcrank`j' = 100 * (rank`j' - 0.5) / 5 } list var* pcrank* +------------------------------------------------------------------------------------+ | var1 var2 var3 var4 var5 pcrank1 pcrank2 pcrank3 pcrank4 pcrank5 | |------------------------------------------------------------------------------------| 1. | 42 11 58 14 29 70 10 90 30 50 | 2. | 99 4 48 32 29 90 10 70 50 30 | 3. | 13 12 43 84 32 30 10 70 90 50 | 4. | 43 32 48 6 100 50 30 70 10 90 | 5. | 2 9 63 23 34 10 30 90 50 70 | |------------------------------------------------------------------------------------| 6. | 22 13 20 23 63 50 10 30 70 90 | 7. | 80 74 34 93 78 70 30 10 90 50 | 8. | 50 30 99 24 59 50 30 90 10 70 | 9. | 49 85 13 47 1 70 90 30 50 10 | 10. | 80 22 59 31 89 70 10 50 30 90 | +------------------------------------------------------------------------------------+

What's to discuss?

1. Although there are other defensible rules, starting with (rank - 0.5) / sample size treats tails symmetrically, so that for example the median of an odd number of values is assigned cumulative probability 0.5. It's also, to the best of my knowledge, the oldest rule that is statistically informed. If it's important to you to use another rule, say the perverse rank / sample size, then the code is easy.

2. If there are ties, rowranks uses the usual convention of preserving the sum of the ranks.

3. If you want to rank in the opposite direction rowranks has an option for that, or with the rule used here take the complement in 100.

4. It could be that you'd be better off with a long layout, but that depends on what else you want to do.
Comment
Henry Cust

Join Date: Nov 2021

Posts: 18
#4

08 Aug 2022, 07:33

Thanks both for your inputs - I'll be sure to include sample data next time. These answer my query and I hope provide code for others too.
Comment

Announcement

Finding variable percentile position / rank in row

Comment

Comment

Comment