Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate percentile for each value of a variable with different frequencies

    For the dataset shown below, the variable "score" shows the scores of students. The variable "frequency" specifies the frequency of each value of the "score" variable. For instance, the first value of the "score" variable is 710 and the corresponding frequency is 13. This means that 13 students got a score 710. Similarly, the second value of the "score" variable is 709 and the corresponding frequency is 3. This means that 3 students got a score 709.

    I want to find the corresponding percentile for each value of the "score" variable with 2 decimal digits (e.g., 99.23 percentile, 81.23 percentile). The standard -xtile and -pctile syntax only create variables that give the value of the "score" variable corresponding to specific percentiles (10%, 20%, etc), not the opposite (the value of percentiles corresponding to each value of the "score" variable", which is the variable I try to create).

    Any help with this will be much appreciated.

    Thanks a lot!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(score frequency year)
    710  13 2019
    709   3 2019
    708   1 2019
    707   1 2019
    706   3 2019
    704   2 2019
    703   2 2019
    702   3 2019
    701   4 2019
    700   4 2019
    699   6 2019
    698   4 2019
    697   9 2019
    696   3 2019
    695   5 2019
    694  13 2019
    693   3 2019
    692   8 2019
    691   5 2019
    690  12 2019
    689   6 2019
    688  16 2019
    687   9 2019
    686   9 2019
    685  19 2019
    684  19 2019
    683  22 2019
    682  18 2019
    681  17 2019
    680  16 2019
    679   9 2019
    678  21 2019
    677  18 2019
    676  22 2019
    675  21 2019
    674  25 2019
    673  29 2019
    672  25 2019
    671  32 2019
    670  33 2019
    669  39 2019
    668  29 2019
    667  37 2019
    666  33 2019
    665  28 2019
    664  36 2019
    663  35 2019
    662  37 2019
    661  46 2019
    660  63 2019
    659  46 2019
    658  43 2019
    657  51 2019
    656  60 2019
    655  64 2019
    654  71 2019
    653  67 2019
    652  56 2019
    651  62 2019
    650  69 2019
    649  68 2019
    648  73 2019
    647  91 2019
    646  75 2019
    645  83 2019
    644  71 2019
    643  67 2019
    642  83 2019
    641  93 2019
    640  67 2019
    639  85 2019
    638  85 2019
    637 107 2019
    636 111 2019
    635  87 2019
    634  89 2019
    633 106 2019
    632  95 2019
    631 117 2019
    630 108 2019
    629 102 2019
    628 121 2019
    627 138 2019
    626 128 2019
    625 104 2019
    624 122 2019
    623 134 2019
    622 134 2019
    621 141 2019
    620 122 2019
    619 127 2019
    618 133 2019
    617 153 2019
    616 154 2019
    615 133 2019
    614 156 2019
    613 146 2019
    612 171 2019
    611 143 2019
    610 159 2019
    609 172 2019
    608 182 2019
    607 168 2019
    606 179 2019
    605 179 2019
    604 199 2019
    603 176 2019
    602 196 2019
    601 169 2019
    600 201 2019
    599 196 2019
    598 214 2019
    597 198 2019
    596 196 2019
    595 199 2019
    594 197 2019
    593 219 2019
    592 197 2019
    591 214 2019
    590 211 2019
    589 208 2019
    588 218 2019
    587 236 2019
    586 242 2019
    585 239 2019
    584 254 2019
    583 246 2019
    582 241 2019
    581 264 2019
    580 264 2019
    579 229 2019
    578 254 2019
    577 261 2019
    576 259 2019
    575 266 2019
    574 264 2019
    573 286 2019
    572 254 2019
    571 296 2019
    570 289 2019
    569 281 2019
    568 263 2019
    567 291 2019
    566 313 2019
    565 287 2019
    564 310 2019
    563 342 2019
    562 323 2019
    561 327 2019
    560 307 2019
    559 314 2019
    558 305 2019
    557 327 2019
    556 341 2019
    555 349 2019
    554 377 2019
    553 318 2019
    552 333 2019
    551 329 2019
    550 343 2019
    549 312 2019
    548 357 2019
    547 378 2019
    546 351 2019
    545 346 2019
    544 375 2019
    543 393 2019
    542 395 2019
    541 375 2019
    540 385 2019
    539 391 2019
    538 365 2019
    537 360 2019
    536 401 2019
    535 376 2019
    534 419 2019
    533 399 2019
    532 381 2019
    531 396 2019
    530 395 2019
    529 391 2019
    528 380 2019
    527 411 2019
    526 367 2019
    525 421 2019
    524 419 2019
    523 423 2019
    522 417 2019
    521 403 2019
    520 458 2019
    519 461 2019
    518 443 2019
    517 444 2019
    516 439 2019
    515 443 2019
    514 436 2019
    513 376 2019
    512 435 2019
    511 422 2019
    510 441 2019
    509 463 2019
    end

  • #2
    Thanks for your very clear question.

    What you want is often called the percentile rank or (in a graphical context) the plotting position. Here is one recipe. See also https://www.stata.com/support/faqs/s...ting-positions

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(score frequency year)
    710  13 2019
    709   3 2019
    708   1 2019
    707   1 2019
    706   3 2019
    704   2 2019
    703   2 2019
    702   3 2019
    701   4 2019
    700   4 2019
    699   6 2019
    698   4 2019
    697   9 2019
    696   3 2019
    695   5 2019
    694  13 2019
    693   3 2019
    692   8 2019
    691   5 2019
    690  12 2019
    689   6 2019
    688  16 2019
    687   9 2019
    686   9 2019
    685  19 2019
    684  19 2019
    683  22 2019
    682  18 2019
    681  17 2019
    680  16 2019
    679   9 2019
    678  21 2019
    677  18 2019
    676  22 2019
    675  21 2019
    674  25 2019
    673  29 2019
    672  25 2019
    671  32 2019
    670  33 2019
    669  39 2019
    668  29 2019
    667  37 2019
    666  33 2019
    665  28 2019
    664  36 2019
    663  35 2019
    662  37 2019
    661  46 2019
    660  63 2019
    659  46 2019
    658  43 2019
    657  51 2019
    656  60 2019
    655  64 2019
    654  71 2019
    653  67 2019
    652  56 2019
    651  62 2019
    650  69 2019
    649  68 2019
    648  73 2019
    647  91 2019
    646  75 2019
    645  83 2019
    644  71 2019
    643  67 2019
    642  83 2019
    641  93 2019
    640  67 2019
    639  85 2019
    638  85 2019
    637 107 2019
    636 111 2019
    635  87 2019
    634  89 2019
    633 106 2019
    632  95 2019
    631 117 2019
    630 108 2019
    629 102 2019
    628 121 2019
    627 138 2019
    626 128 2019
    625 104 2019
    624 122 2019
    623 134 2019
    622 134 2019
    621 141 2019
    620 122 2019
    619 127 2019
    618 133 2019
    617 153 2019
    616 154 2019
    615 133 2019
    614 156 2019
    613 146 2019
    612 171 2019
    611 143 2019
    610 159 2019
    609 172 2019
    608 182 2019
    607 168 2019
    606 179 2019
    605 179 2019
    604 199 2019
    603 176 2019
    602 196 2019
    601 169 2019
    600 201 2019
    599 196 2019
    598 214 2019
    597 198 2019
    596 196 2019
    595 199 2019
    594 197 2019
    593 219 2019
    592 197 2019
    591 214 2019
    590 211 2019
    589 208 2019
    588 218 2019
    587 236 2019
    586 242 2019
    585 239 2019
    584 254 2019
    583 246 2019
    582 241 2019
    581 264 2019
    580 264 2019
    579 229 2019
    578 254 2019
    577 261 2019
    576 259 2019
    575 266 2019
    574 264 2019
    573 286 2019
    572 254 2019
    571 296 2019
    570 289 2019
    569 281 2019
    568 263 2019
    567 291 2019
    566 313 2019
    565 287 2019
    564 310 2019
    563 342 2019
    562 323 2019
    561 327 2019
    560 307 2019
    559 314 2019
    558 305 2019
    557 327 2019
    556 341 2019
    555 349 2019
    554 377 2019
    553 318 2019
    552 333 2019
    551 329 2019
    550 343 2019
    549 312 2019
    548 357 2019
    547 378 2019
    546 351 2019
    545 346 2019
    544 375 2019
    543 393 2019
    542 395 2019
    541 375 2019
    540 385 2019
    539 391 2019
    538 365 2019
    537 360 2019
    536 401 2019
    535 376 2019
    534 419 2019
    533 399 2019
    532 381 2019
    531 396 2019
    530 395 2019
    529 391 2019
    528 380 2019
    527 411 2019
    526 367 2019
    525 421 2019
    524 419 2019
    523 423 2019
    522 417 2019
    521 403 2019
    520 458 2019
    519 461 2019
    518 443 2019
    517 444 2019
    516 439 2019
    515 443 2019
    514 436 2019
    513 376 2019
    512 435 2019
    511 422 2019
    510 441 2019
    509 463 2019
    end
    
    bysort year (score): egen total = total(frequency)
    by year : gen pc_rank = 100 * (sum(frequency) - 0.5 * frequency) / total 
    
    list if inrange(_n, 1, 10) | inrange(_n, _N - 9, _N)
    
         +--------------------------------------------+
         | score   freque~y   year   total    pc_rank |
         |--------------------------------------------|
      1. |   509        463   2019   37699   .6140746 |
      2. |   510        441   2019   37699   1.813045 |
      3. |   511        422   2019   37699   2.957638 |
      4. |   512        435   2019   37699   4.094273 |
      5. |   513        376   2019   37699   5.169899 |
         |--------------------------------------------|
      6. |   514        436   2019   37699    6.24685 |
      7. |   515        443   2019   37699   7.412663 |
      8. |   516        439   2019   37699   8.582456 |
      9. |   517        444   2019   37699   9.753574 |
     10. |   518        443   2019   37699      10.93 |
         |--------------------------------------------|
    192. |   700          4   2019   37699   99.90981 |
    193. |   701          4   2019   37699   99.92043 |
    194. |   702          3   2019   37699    99.9297 |
    195. |   703          2   2019   37699   99.93634 |
    196. |   704          2   2019   37699   99.94164 |
         |--------------------------------------------|
    197. |   706          3   2019   37699   99.94827 |
    198. |   707          1   2019   37699   99.95358 |
    199. |   708          1   2019   37699   99.95623 |
    200. |   709          3   2019   37699   99.96154 |
    201. |   710         13   2019   37699   99.98276 |
         +--------------------------------------------+

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Thanks for your very clear question.

      What you want is often called the percentile rank or (in a graphical context) the plotting position. Here is one recipe. See also https://www.stata.com/support/faqs/s...ting-positions

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(score frequency year)
      710 13 2019
      709 3 2019
      708 1 2019
      707 1 2019
      706 3 2019
      704 2 2019
      703 2 2019
      702 3 2019
      701 4 2019
      700 4 2019
      699 6 2019
      698 4 2019
      697 9 2019
      696 3 2019
      695 5 2019
      694 13 2019
      693 3 2019
      692 8 2019
      691 5 2019
      690 12 2019
      689 6 2019
      688 16 2019
      687 9 2019
      686 9 2019
      685 19 2019
      684 19 2019
      683 22 2019
      682 18 2019
      681 17 2019
      680 16 2019
      679 9 2019
      678 21 2019
      677 18 2019
      676 22 2019
      675 21 2019
      674 25 2019
      673 29 2019
      672 25 2019
      671 32 2019
      670 33 2019
      669 39 2019
      668 29 2019
      667 37 2019
      666 33 2019
      665 28 2019
      664 36 2019
      663 35 2019
      662 37 2019
      661 46 2019
      660 63 2019
      659 46 2019
      658 43 2019
      657 51 2019
      656 60 2019
      655 64 2019
      654 71 2019
      653 67 2019
      652 56 2019
      651 62 2019
      650 69 2019
      649 68 2019
      648 73 2019
      647 91 2019
      646 75 2019
      645 83 2019
      644 71 2019
      643 67 2019
      642 83 2019
      641 93 2019
      640 67 2019
      639 85 2019
      638 85 2019
      637 107 2019
      636 111 2019
      635 87 2019
      634 89 2019
      633 106 2019
      632 95 2019
      631 117 2019
      630 108 2019
      629 102 2019
      628 121 2019
      627 138 2019
      626 128 2019
      625 104 2019
      624 122 2019
      623 134 2019
      622 134 2019
      621 141 2019
      620 122 2019
      619 127 2019
      618 133 2019
      617 153 2019
      616 154 2019
      615 133 2019
      614 156 2019
      613 146 2019
      612 171 2019
      611 143 2019
      610 159 2019
      609 172 2019
      608 182 2019
      607 168 2019
      606 179 2019
      605 179 2019
      604 199 2019
      603 176 2019
      602 196 2019
      601 169 2019
      600 201 2019
      599 196 2019
      598 214 2019
      597 198 2019
      596 196 2019
      595 199 2019
      594 197 2019
      593 219 2019
      592 197 2019
      591 214 2019
      590 211 2019
      589 208 2019
      588 218 2019
      587 236 2019
      586 242 2019
      585 239 2019
      584 254 2019
      583 246 2019
      582 241 2019
      581 264 2019
      580 264 2019
      579 229 2019
      578 254 2019
      577 261 2019
      576 259 2019
      575 266 2019
      574 264 2019
      573 286 2019
      572 254 2019
      571 296 2019
      570 289 2019
      569 281 2019
      568 263 2019
      567 291 2019
      566 313 2019
      565 287 2019
      564 310 2019
      563 342 2019
      562 323 2019
      561 327 2019
      560 307 2019
      559 314 2019
      558 305 2019
      557 327 2019
      556 341 2019
      555 349 2019
      554 377 2019
      553 318 2019
      552 333 2019
      551 329 2019
      550 343 2019
      549 312 2019
      548 357 2019
      547 378 2019
      546 351 2019
      545 346 2019
      544 375 2019
      543 393 2019
      542 395 2019
      541 375 2019
      540 385 2019
      539 391 2019
      538 365 2019
      537 360 2019
      536 401 2019
      535 376 2019
      534 419 2019
      533 399 2019
      532 381 2019
      531 396 2019
      530 395 2019
      529 391 2019
      528 380 2019
      527 411 2019
      526 367 2019
      525 421 2019
      524 419 2019
      523 423 2019
      522 417 2019
      521 403 2019
      520 458 2019
      519 461 2019
      518 443 2019
      517 444 2019
      516 439 2019
      515 443 2019
      514 436 2019
      513 376 2019
      512 435 2019
      511 422 2019
      510 441 2019
      509 463 2019
      end
      
      bysort year (score): egen total = total(frequency)
      by year : gen pc_rank = 100 * (sum(frequency) - 0.5 * frequency) / total
      
      list if inrange(_n, 1, 10) | inrange(_n, _N - 9, _N)
      
      +--------------------------------------------+
      | score freque~y year total pc_rank |
      |--------------------------------------------|
      1. | 509 463 2019 37699 .6140746 |
      2. | 510 441 2019 37699 1.813045 |
      3. | 511 422 2019 37699 2.957638 |
      4. | 512 435 2019 37699 4.094273 |
      5. | 513 376 2019 37699 5.169899 |
      |--------------------------------------------|
      6. | 514 436 2019 37699 6.24685 |
      7. | 515 443 2019 37699 7.412663 |
      8. | 516 439 2019 37699 8.582456 |
      9. | 517 444 2019 37699 9.753574 |
      10. | 518 443 2019 37699 10.93 |
      |--------------------------------------------|
      192. | 700 4 2019 37699 99.90981 |
      193. | 701 4 2019 37699 99.92043 |
      194. | 702 3 2019 37699 99.9297 |
      195. | 703 2 2019 37699 99.93634 |
      196. | 704 2 2019 37699 99.94164 |
      |--------------------------------------------|
      197. | 706 3 2019 37699 99.94827 |
      198. | 707 1 2019 37699 99.95358 |
      199. | 708 1 2019 37699 99.95623 |
      200. | 709 3 2019 37699 99.96154 |
      201. | 710 13 2019 37699 99.98276 |
      +--------------------------------------------+
      Hi Nick,

      Thank you so much for the explanation and sample code. I much appreciate it !!

      Comment

      Working...
      X