Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate 2nd Highest Value

    Hi,

    This was supposed to be easy, but its not, for me anyway. I need to generate a new variable that contains the 2nd highest value by raceid. Here its 87 and 116. Using bysort raceid : gen second_highest = (dyha[_N-1]) is not providing those returns. any assistance appreciated please.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str27 raceid int dyha
    "Ascot435860.618055555555556"  86
    "Ascot435860.618055555555556"  89
    "Ascot435860.618055555555556"  89
    "Ascot435860.618055555555556"  84
    "Ascot435860.618055555555556"  80
    "Ascot435860.618055555555556"  84
    "Ascot435860.618055555555556"  84
    "Ascot435860.618055555555556"  87
    "Ascot435860.618055555555556"  82
    "Ascot435860.642361111111111" 116
    "Ascot435860.642361111111111" 122
    "Ascot435860.642361111111111" 107
    "Ascot435860.642361111111111" 122
    "Ascot435860.642361111111111"  95
    "Ascot435860.642361111111111" 108
    "Ascot435860.642361111111111" 114
    end

  • #2

    Code:
    bysort raceid: egen rank = rank(dyha), track
    by raceid: egen maxrank = max(rank)
    by raceid: egen secondhighest = max(cond(rank==maxrank-1,dyha,.))
    drop rank maxrank
    Note: edited after noticing error in original code.
    Last edited by Ali Atia; 17 Mar 2022, 20:16.

    Comment


    • #3
      A one liner:

      Code:
      bys raceid (dyha): egen secondhighest = max(cond(dyha<(dyha[_N]),dyha,.))

      Comment


      • #4
        As mentioned elsewhere, there used to be a warning against egen calls using subscripts on the grounds that egen feels at liberty to change sort order temporarily. In practice #3 should work in an up-to-date version of Stata. The paranoid way to code it would be to find the maximum and then in turn the maximum less than that.

        Comment


        • #5
          As advised by Nick, using subscripts within -egen-renders #3 from working properly, even though the logic looks reasonable. Separating the code into 2 lines would be required.

          I also add that using -rank() in #2 is neither safe in the case of ties. For the example below, the code in #2 does not provide the desired output. To me, running sum appears to be a better choice.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str27 raceid int dyha
          "Ascot435860.618055555555556"  80
          "Ascot435860.618055555555556"  82
          "Ascot435860.618055555555556"  84
          "Ascot435860.618055555555556"  84
          "Ascot435860.618055555555556"  84
          "Ascot435860.618055555555556"  86
          "Ascot435860.618055555555556"  87
          "Ascot435860.618055555555556"  87
          "Ascot435860.618055555555556"  89
          "Ascot435860.618055555555556"  89
          "Ascot435860.642361111111111"  95
          "Ascot435860.642361111111111" 107
          "Ascot435860.642361111111111" 108
          "Ascot435860.642361111111111" 114
          "Ascot435860.642361111111111" 116
          "Ascot435860.642361111111111" 122
          "Ascot435860.642361111111111" 122
          end
          
          *Following the idea in #2
          
          gsort raceid -dyha
          by raceid: gen wanted = dyha if sum(dyha!=dyha[_n-1])==2
          bys raceid (wanted): replace wanted = wanted[1]
          
          *Following the idea in #3
          
          bys raceid (dyha): gen exMax = dyha if dyha<dyha[_N]
          by raceid: egen wanted2 = max(exMax)
          drop exMax
          Last edited by Romalpa Akzo; 18 Mar 2022, 03:02.

          Comment


          • #6
            Romalpa Akzo In turn your code would be broken by missing values, which sort highest.

            Comment


            • #7
              Nick is right. I do assume no missings in dyha. Below improvement (only for the code following the idea in #3) would be applicable for such cases.
              Code:
              egen maxDyha = max(dyha), by(raceid)
              egen wanted3 = max(cond(dyha<maxDyha, dyha,.)), by(raceid)
              drop maxDyha

              Comment


              • #8
                Thank you very much all. I will go with Romalpa's solution in #7. There will be no missing values in dyha.

                Comment

                Working...
                X