Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percentile ranking

    Dear all,

    I got the following code because I need to calculate the time-series percentile rankings (a very confusing term for me given that I did not find a very clear definition) for my variable ("var") for each id.
    I am kind of unsure if it is correct or not...

    To explain in details what I did:

    * I calculated the rank of my variable per year (so each id will receive a ranking based on the total number of observations per year (excluding therefore missing values)
    * Following that, since I have a panel data from 1995 to 2000, I calculated the average of the rankings that I got for each year (let's say id_3000 had ranking: 1, 10, 15, 18, 19 and 20, the average would be then 13,83)
    * After that, each firm has an average ranking for the 6 years and I calculate the median of these average rankings.
    * If the average ranking is above the computed median, I would mark the id as a "1" and if not, it is a "0".

    Code:
    sort id
    bysort year: egen rank = rank(var), track
    bysort year: egen n = count(var)
    gen pcrank = ((rank-1)/(n-1))
    egen mean = mean(pcrank), by(id)
    egen median_var = median(mean)
    gen var_pcrank = 1 if mean > median_var
    replace var_pcrank = 0 if mean <= median_var
    replace var_pcrank = . if mean == .
    Thank you very much if you can help me,
    Best regards,
    Eugene

  • #2
    I don't understand what the question is here.

    The idea of a percentile rank is that some percent of values are smaller than a given value -- say, your test score or a child's height or weight -- and so the complementary percent are larger. That seems clear enough to me except that it leaves a lot of small print about exactly how it's done -- is that smaller or equal to, etc.? what about ties? -- but the goal of https://www.stata.com/support/faqs/s...ing-positions/ was to cover the most common cases, and I am always open to comments about important detail that may have been omitted. It seems that you are drawing upon that FAQ, or unwittingly on some source that does.

    I will add some minor comments on your code, although so far as I can see it should do more or less what you want.

    Despite the name percentile your ranks are scaled to lie in [0, 1].

    Indeed the definition [rank - 1] / [count - 1] is one I dislike for reasons epitomised by the results of invnormal(0)or invnormal(1).

    I wouldn't use the
    track option myself. The default of rank() is more nearly standard and treats ties symmetrically.

    Your code can be slimmed down to


    Code:
      
    bysort year: egen rank = rank(var)  
    by year: egen n = count(var)  
    gen pcrank = (rank-1)/(n-1)  
    egen mean = mean(pcrank), by(id)  
    su mean, detail  
    scalar median = r(p50)  
    gen var_pcrank = mean > median if mean < .

    Comment


    • #3
      Have a look at "fractional rank" -fracrank- program by Philippe Van Kerm: it's part of his -sgini- module on SSC

      Comment

      Working...
      X