Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate mean of highest 3 values out of 5 values.

    Hi Statalist users. I have data on few students. Each student appeared in 5 tests. I have to calculate the mean of the 3 highest values out of the 5. Can someone kindly help with the code.
    Here is the data.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id t1 t2 t3 t4 t5)
    1 66  67  95 61 56
    2 92  84  98 70 97
    3 70  80  96 88 74
    4 66 100  75 70 66
    5 82  61  74 86 72
    6 59  67 100 68 56
    7 70  55  93 67 58
    8 85  54  98 82 98
    9 72  74  74 63 79
    end

  • #2
    I tried this code. Though this works, but I believe there's a more efficient way to code.
    Code:
    sort id t
    bysort id: gen id2 = _n
    bysort id: egen mean = mean(t) if inrange(id2, 3, 5)
    egen tag = tag(id) if mean < .

    Comment


    • #3
      The code in #2 does not apply to the data layout in #1. Here are two methods of doing it. Your #2 is clearly using a similar idea to the second method.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(id t1 t2 t3 t4 t5)
      1 66  67  95 61 56
      2 92  84  98 70 97
      3 70  80  96 88 74
      4 66 100  75 70 66
      5 82  61  74 86 72
      6 59  67 100 68 56
      7 70  55  93 67 58
      8 85  54  98 82 98
      9 72  74  74 63 79
      end
      
      rowsort t1-t5, gen(s1-s5)
      gen wanted = (s3 + s4 + s5) / 3 
      
      reshape long t, i(id) j(which)
      
      bysort id (t) : gen wanted2 = (t[3] + t[4] + t[5]) / 3 
      
      reshape wide t, i(id) j(which)
      
      list 
      
           +-------------------------------------------------------------------------------+
           | id   t1    t2    t3   t4   t5   s1   s2   s3   s4    s5     wanted    wanted2 |
           |-------------------------------------------------------------------------------|
        1. |  1   66    67    95   61   56   56   61   66   67    95         76         76 |
        2. |  2   92    84    98   70   97   70   84   92   97    98   95.66666   95.66666 |
        3. |  3   70    80    96   88   74   70   74   80   88    96         88         88 |
        4. |  4   66   100    75   70   66   66   66   70   75   100   81.66666   81.66666 |
        5. |  5   82    61    74   86   72   61   72   74   82    86   80.66666   80.66666 |
           |-------------------------------------------------------------------------------|
        6. |  6   59    67   100   68   56   56   59   67   68   100   78.33334   78.33334 |
        7. |  7   70    55    93   67   58   55   58   67   70    93   76.66666   76.66666 |
        8. |  8   85    54    98   82   98   54   82   85   98    98   93.66666   93.66666 |
        9. |  9   72    74    74   63   79   63   72   74   74    79   75.66666   75.66666 |
           +-------------------------------------------------------------------------------+
      rowsort is from the Stata Journal.

      How is efficiency measured? Code length? Machine time? Storage implications? Programmer time?

      Neither method is smart about missing values.

      Comment


      • #4
        In #2 , I reshaped the data to long form before using the code.
        Code:
        reshape long t, i(id) j(exam)

        Comment

        Working...
        X