Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding variable percentile position / rank in row

    Hi,

    I wish to find the percentile position of a variable in a row of other variables.

    E.g. I have 30 unordered vars that make up my distribution called var1, var2 etc. Then another variable not part of the distribution, let's say var_star, that I want the percentile rank or position of within var1-var30.

    Any help would be greatly appreciated, I am sure it is straightforward, and I simply cannot think of the right question to ask.

    Henry

  • #2
    It would have been easier for us if you included example data. Now we have to interpret what you said about what your data looked like. Interpreting is more work than necessary, and than implementing our interpretation by making an example dataset is more work than necessary. Instead you could have saved us that by including an example dataset with the dataex command, as is recommended in the Statalist FAQ. Moreover, if we interpret your text differently from how you intended it, then our answer will not fit your problem.

    If I interpret your data correctly, then the answer is that everything is easier in long format:

    Code:
    // create some example data
    clear
    set obs 10
    gen id = _n
    forvalues i = 1/30 {
        gen x`i' = rnormal()
    }
    
    // the solution
    reshape long x, i(id) j(t)
    egen xstar = rank(x), by(id)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Ranking across rows was treated in the Stata Journal in 2009. A sequel might as well be mentioned here.

      SJ-20-2 pr0046_1 . . . . . . . . . . . Speaking Stata: More ways for rowwise
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q2/20 SJ 20(2):481--488 (no commands)
      focuses on returning which variable or variables are equal
      to the maximum or minimum in a row

      SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
      (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
      Q1/09 SJ 9(1):137--157
      shows how to exploit functions, egen functions, and Mata
      for working rowwise; rowsort and rowranks are introduced

      Then it's a matter of applying your recipe for percentile rank, which is an FAQ:

      FAQ . . . . . . . . . . Calculating percentile ranks or plotting positions
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      2/14 How can I calculate percentile ranks?
      How can I calculate plotting positions?
      http://www.stata.com/support/faqs/st...centile-ranks-
      and-plotting-positions/

      Absent a data example, 5 variables will serve as well as 30 to show technique:

      Code:
      clear 
      set obs 10 
      set seed 2803 
      forval j = 1/5 {
          gen var`j' = runiformint(0, 100)
      }
      
      list 
      
      rowranks var? , gen(rank1-rank5)  
      
      forval j = 1/5 { 
          gen pcrank`j' = 100 * (rank`j' - 0.5) / 5 
      }
      
      list var* pcrank*
      
           +------------------------------------------------------------------------------------+
           | var1   var2   var3   var4   var5   pcrank1   pcrank2   pcrank3   pcrank4   pcrank5 |
           |------------------------------------------------------------------------------------|
        1. |   42     11     58     14     29        70        10        90        30        50 |
        2. |   99      4     48     32     29        90        10        70        50        30 |
        3. |   13     12     43     84     32        30        10        70        90        50 |
        4. |   43     32     48      6    100        50        30        70        10        90 |
        5. |    2      9     63     23     34        10        30        90        50        70 |
           |------------------------------------------------------------------------------------|
        6. |   22     13     20     23     63        50        10        30        70        90 |
        7. |   80     74     34     93     78        70        30        10        90        50 |
        8. |   50     30     99     24     59        50        30        90        10        70 |
        9. |   49     85     13     47      1        70        90        30        50        10 |
       10. |   80     22     59     31     89        70        10        50        30        90 |
           +------------------------------------------------------------------------------------+
      What's to discuss?

      1. Although there are other defensible rules, starting with (rank - 0.5) / sample size treats tails symmetrically, so that for example the median of an odd number of values is assigned cumulative probability 0.5. It's also, to the best of my knowledge, the oldest rule that is statistically informed. If it's important to you to use another rule, say the perverse rank / sample size, then the code is easy.

      2. If there are ties, rowranks uses the usual convention of preserving the sum of the ranks.

      3. If you want to rank in the opposite direction rowranks has an option for that, or with the rule used here take the complement in 100.

      4. It could be that you'd be better off with a long layout, but that depends on what else you want to do.




      Comment


      • #4
        Thanks both for your inputs - I'll be sure to include sample data next time. These answer my query and I hope provide code for others too.

        Comment

        Working...
        X