Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen=rank() not giving ranks ranging between zero and 1 for a log variable

    Hi,

    I am running a rank-rank regression using percentages of rank. I am trying to create percentile ranks on log wage. However, when I run the codes, I fail to get a value between zero and one. I have tried to follow this guide but this has not worked for me: https://www.stata.com/support/faqs/s...ing-positions/

    The codes that I have tried and the output that I am getting:

    Code:
    egen r1_log_hourly_wage = rank(log_hourly_wage)
    egen r1_log_hourly_wage = rank(-log_hourly_wage)
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(log_hourly_wage r1_log_hourly_wage)
            .       .
            .       .
            .       .
     3.653033   16066
            .       .
            .       .
            .       .
            .       .
     3.077669   12159
            .       .
            .       .
            .       .
            .       .
            .       .
            .       .
            .       .
     3.077669   12159
            .       .
            .       .
     3.077669   12159
     3.077669   12159
     3.077669   12159
            .       .
            .       .
            .       .
            .       .
            .       .
            .       .
     3.588495 15754.5
            .       .
            .       .
            .       .
            .       .
            .       .
     2.959886   11042
            .       .
     3.077669   12159
     3.077669   12159
            .       .
            .       .
      3.18303 12782.5
            .       .
            .       .
            .       .
            .       .
            .       .
     2.554421    6631
     2.554421    6631
            .       .
            .       .
     2.826355  9375.5
            .       .
            .       .
            .       .
            .       .
            .       .
     2.554421    6631
            .       .
            .       .
     2.554421    6631
     2.554421    6631
            .       .
            .       .
    3.2112005   12932
            .       .
            .       .
            .       .
     3.758394   16423
     1.504599     312
            .       .
            .       .
     2.842103    9500
            .       .
            .       .
            .       .
    2.2549045    2772
    end

  • #2
    I can't follow what is puzzling you here. The code you cite would create

    1. ranks running 1 for smallest to n for largest for n observations, averaging for ties.

    2. ranks running 1 for largest to n for smallest for n observations, averaging for ties.

    However, the code you give would fail as the second egen statement tries to create a variable with the same name as that created by the first.

    Neither of those is a percentile rank, whether a percentage or a fraction: as the linked FAQ explains, calculating the ranks is just the first step to either of those. You need to divide by the number of values, and optionally multiply by 100; there are conventions to ensure that no result is ever 0 or 1 (or 0 or 100), which in many contexts would cause problems, not least that log 0 is not defined,

    Comment

    Working...
    X