Create a rank variable

Nick Cox

Join Date: Mar 2014
Posts: 35451

#16

12 Dec 2022, 15:51

Thanks. I think I get it now. Here is a demonstration using rangerun from SSC.

There are lots of details, some simple, some subtle.

Your variable is evidently not invest but data. Or -- if it is something else -- that is your call.

Your time window appears to be -30 0, so the previous 30 days and this day. But you need a proper Stata daily date variable. Your string date variable is no use for this purpose.

There are several definitions of rank and percentile rank. See also https://www.stata.com/support/faqs/s...ing-positions/ for one perspective.

You may not have an analogue of the panel variable company. If not, there is no by() option call.

Code:

webuse grunfeld, clear

capture program drop pcrank 

program pcrank 
    count if invest < invest[_N] & invest < . 
    scalar i1 = r(N)
    count if invest == invest[_N] & invest < . 
    scalar i2 = r(N)
    count if invest < . 
    gen n = r(N)
    gen rank = i1 + i2 
    gen pcrank = (i1 + 0.5 * i2) / n 
end 

rangerun pcrank, use(invest) int(year -9 0) by(company)

list invest n rank pcrank if company == 1 

     +-------------------------------+
     | invest    n   rank     pcrank |
     |-------------------------------|
  1. |  317.6    1      1         .5 |
  2. |  391.8    2      2        .75 |
  3. |  410.6    3      3   .8333333 |
  4. |  257.7    4      1       .125 |
  5. |  330.8    5      3         .5 |
     |-------------------------------|
  6. |  461.2    6      6   .9166667 |
  7. |    512    7      7   .9285714 |
  8. |    448    8      6      .6875 |
  9. |  499.6    9      8   .8333333 |
 10. |  547.5   10     10        .95 |
     |-------------------------------|
 11. |  561.2   10     10        .95 |
 12. |  688.1   10     10        .95 |
 13. |  568.9   10      9        .85 |
 14. |  529.2   10      6        .55 |
 15. |  555.1   10      7        .65 |
     |-------------------------------|
 16. |  642.9   10      9        .85 |
 17. |  755.9   10     10        .95 |
 18. |  891.2   10     10        .95 |
 19. | 1304.4   10     10        .95 |
 20. | 1486.7   10     10        .95 |
     +-------------------------------+

Comment

Cristiano Bellavitis

Join Date: Jul 2018

Posts: 31
#17

14 Dec 2022, 10:47

Thanks a lot for your continuous help Nick.

I have used the following code which seems to work, but all the observations get the same value (0.71xx).
Total_buys is the variable I want to rank. id is a unique identifier for each observation. Date is a date which I believe is in good Stata format. Am I making any mistake? Thanks again!

Code:

program pcrank count if total_buys < total_buys[_N] & total_buys < . scalar i1 = r(N) count if total_buys == total_buys[_N] & total_buys < . scalar i2 = r(N) count if total_buys < . gen n = r(N) gen rank = i1 + i2 gen pcrank = (i1 + 0.5 * i2) / n rangerun pcrank, use(total_buys) int(Date -30 -1) by(id)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#18

14 Dec 2022, 11:48

It is likely that you don't need by(id), which enforces ranking within id. As said in #16

You may not have an analogue of the panel variable company. If not, there is no by() option call.
Comment
Sharren Zheng

Join Date: Mar 2025

Posts: 1
#19

12 Mar 2025, 21:05

I observed that the sequence 1~N+ generated by egen(group) for sorting purposes represents from small to large, but how to change it to 1~N+ represents from large to small?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29963
#20

12 Mar 2025, 21:12

You can't do that with -egen-. But you can do it from first principles, as illustrated in the following code that uses the auto.dta:

Code:

sysuse auto, clear gsort -mpg gen `c(obs_t)' wanted = sum(mpg != mpg[_n-1] & !missing(mpg))
1 like
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1326

#21

12 Mar 2025, 21:15

Something like this:

Code:

egen unwanted = group(original_var) // or whatever egen ... group() statement you want
qui su unwanted
local num_groups = r(max)
gen wanted = `num_groups' - unwanted + 1

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35451

#22

13 Mar 2025, 01:45

You could just negate a numeric variable first. If you wanted value labels in terms of the original, that can be done. Here I use labmask from the Stata Journal for the latter.

Code:

. sysuse auto, clear
(1978 automobile data)

. gen negmpg = -mpg

. egen wanted = group(negmpg)

. labmask wanted, values(mpg)

. tab wanted

group(negmp |
         g) |      Freq.     Percent        Cum.
------------+-----------------------------------
         41 |          1        1.35        1.35
         35 |          2        2.70        4.05
         34 |          1        1.35        5.41
         31 |          1        1.35        6.76
         30 |          2        2.70        9.46
         29 |          1        1.35       10.81
         28 |          3        4.05       14.86
         26 |          3        4.05       18.92
         25 |          5        6.76       25.68
         24 |          4        5.41       31.08
         23 |          3        4.05       35.14
         22 |          5        6.76       41.89
         21 |          5        6.76       48.65
         20 |          3        4.05       52.70
         19 |          8       10.81       63.51
         18 |          9       12.16       75.68
         17 |          4        5.41       81.08
         16 |          4        5.41       86.49
         15 |          2        2.70       89.19
         14 |          6        8.11       97.30
         12 |          2        2.70      100.00
------------+-----------------------------------
      Total |         74      100.00

. tab wanted, nolabel

group(negmp |
         g) |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          1        1.35        1.35
          2 |          2        2.70        4.05
          3 |          1        1.35        5.41
          4 |          1        1.35        6.76
          5 |          2        2.70        9.46
          6 |          1        1.35       10.81
          7 |          3        4.05       14.86
          8 |          3        4.05       18.92
          9 |          5        6.76       25.68
         10 |          4        5.41       31.08
         11 |          3        4.05       35.14
         12 |          5        6.76       41.89
         13 |          5        6.76       48.65
         14 |          3        4.05       52.70
         15 |          8       10.81       63.51
         16 |          9       12.16       75.68
         17 |          4        5.41       81.08
         18 |          4        5.41       86.49
         19 |          2        2.70       89.19
         20 |          6        8.11       97.30
         21 |          2        2.70      100.00
------------+-----------------------------------
      Total |         74      100.00

The terms small and large suggest that numeric variables are what you're thinking about.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#23

13 Mar 2025, 05:58

And more succinctly following #21 and #22, to reverse ranking by negating the original values:

Code:

egen unwanted = group(-original_var)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#24

13 Mar 2025, 07:15

Leonardo Guizzetti No, unfortunately. That trick works with the rank() function of egen but not with group(). The argument for group() must be a varlist, a list of one or more variable names, and the negative sign is not allowed with that rule. The reason it is allowed with rank() is that the latter feeds on an expression, in which negative signs are perfectly legal.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#25

13 Mar 2025, 07:29

Ah you’re right. I was definitely thinking of rank. Thanks for the correction.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#26

14 Mar 2025, 04:05

This sub-thread goes back only to #19.

FWIW, here is another way to do it. Value labels are assigned automatically.

myaxis is from the Stata Journal. It was previously announced as added to SSC at https://www.statalist.org/forums/for...e-or-graph-use

SJ-21-3 st0654 . . Speaking Stata: Ordering or ranking groups of observations
(help myaxis if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/21 SJ 21(3):818--837
discusses procedures for datasets based on aggregate
frequencies and for datasets based on individuals and
introduce a new convenience command, myaxis, that handles
many cases directly

Code:

sysuse auto, clear myaxis wanted=mpg, sort(mean mpg) descending
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment