Generate mean of highest 3 values out of 5 values.

Inaamul Haq

Join Date: Feb 2019

Posts: 55
#1

Generate mean of highest 3 values out of 5 values.

25 Jan 2024, 00:14

Hi Statalist users. I have data on few students. Each student appeared in 5 tests. I have to calculate the mean of the 3 highest values out of the 5. Can someone kindly help with the code.
Here is the data.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(id t1 t2 t3 t4 t5) 1 66 67 95 61 56 2 92 84 98 70 97 3 70 80 96 88 74 4 66 100 75 70 66 5 82 61 74 86 72 6 59 67 100 68 56 7 70 55 93 67 58 8 85 54 98 82 98 9 72 74 74 63 79 end
Tags: None
Inaamul Haq

Join Date: Feb 2019

Posts: 55
#2

25 Jan 2024, 00:39

I tried this code. Though this works, but I believe there's a more efficient way to code.

Code:

sort id t bysort id: gen id2 = _n bysort id: egen mean = mean(t) if inrange(id2, 3, 5) egen tag = tag(id) if mean < .
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35219

25 Jan 2024, 01:44

The code in #2 does not apply to the data layout in #1. Here are two methods of doing it. Your #2 is clearly using a similar idea to the second method.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id t1 t2 t3 t4 t5)
1 66  67  95 61 56
2 92  84  98 70 97
3 70  80  96 88 74
4 66 100  75 70 66
5 82  61  74 86 72
6 59  67 100 68 56
7 70  55  93 67 58
8 85  54  98 82 98
9 72  74  74 63 79
end

rowsort t1-t5, gen(s1-s5)
gen wanted = (s3 + s4 + s5) / 3 

reshape long t, i(id) j(which)

bysort id (t) : gen wanted2 = (t[3] + t[4] + t[5]) / 3 

reshape wide t, i(id) j(which)

list 

     +-------------------------------------------------------------------------------+
     | id   t1    t2    t3   t4   t5   s1   s2   s3   s4    s5     wanted    wanted2 |
     |-------------------------------------------------------------------------------|
  1. |  1   66    67    95   61   56   56   61   66   67    95         76         76 |
  2. |  2   92    84    98   70   97   70   84   92   97    98   95.66666   95.66666 |
  3. |  3   70    80    96   88   74   70   74   80   88    96         88         88 |
  4. |  4   66   100    75   70   66   66   66   70   75   100   81.66666   81.66666 |
  5. |  5   82    61    74   86   72   61   72   74   82    86   80.66666   80.66666 |
     |-------------------------------------------------------------------------------|
  6. |  6   59    67   100   68   56   56   59   67   68   100   78.33334   78.33334 |
  7. |  7   70    55    93   67   58   55   58   67   70    93   76.66666   76.66666 |
  8. |  8   85    54    98   82   98   54   82   85   98    98   93.66666   93.66666 |
  9. |  9   72    74    74   63   79   63   72   74   74    79   75.66666   75.66666 |
     +-------------------------------------------------------------------------------+

rowsort is from the Stata Journal.

How is efficiency measured? Code length? Machine time? Storage implications? Programmer time?

Neither method is smart about missing values.

Comment

Inaamul Haq

Join Date: Feb 2019

Posts: 55
#4

25 Jan 2024, 03:44

In #2 , I reshaped the data to long form before using the code.

Code:

reshape long t, i(id) j(exam)
Comment

Announcement

Generate mean of highest 3 values out of 5 values.

Comment

Comment

Comment