Hello, I am struggling with organising the following data set.
[CODE]
I want to keep the h_pid1 with the greatest nhpid values within the h_pid group. For example, within the group, h_pid==1502, I only want to keep the observations of h_pid1==1503 because it have the highest nhpid value among the same h_pid group.
I tried using bysort to start but I struggled to get the result that I want.
It would be appreciated if you could help me with this. Thank you
[CODE]
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long h_pid float(h_pid1 nhpid age) 1502 1503 16 34 1502 1503 16 40 1502 1503 16 33 1502 1503 16 35 1502 1503 16 41 1502 1503 16 32 1502 1503 16 38 1502 1503 16 43 1502 1503 16 37 1502 1503 16 45 1502 1503 16 36 1502 1503 16 42 1502 1503 16 30 1502 1503 16 39 1502 1503 16 31 1502 1503 16 44 1502 1504 12 45 1502 1504 12 37 1502 1504 12 38 1502 1504 12 35 1502 1504 12 44 1502 1504 12 36 1502 1504 12 34 1502 1504 12 40 1502 1504 12 42 1502 1504 12 41 1502 1504 12 43 1502 1504 12 39 3802 3803 7 42 3802 3803 7 34 3802 3803 7 39 3802 3803 7 43 3802 3803 7 44 3802 3803 7 37 3802 3803 7 36 3802 3804 7 46 3802 3804 7 35 3802 3804 7 45 3802 3804 7 41 3802 3804 7 40 3802 3804 7 38 3802 3804 7 33 4502 4503 16 33 4502 4503 16 41 4502 4503 16 38 4502 4503 16 42 4502 4503 16 46 4502 4503 16 43 4502 4503 16 39 4502 4503 16 34 4502 4503 16 36 4502 4503 16 47 4502 4503 16 45 4502 4503 16 44 4502 4503 16 37 4502 4503 16 32 4502 4503 16 35 4502 4503 16 40 end
I tried using bysort to start but I struggled to get the result that I want.
It would be appreciated if you could help me with this. Thank you
Comment