select the first or the last group

Chul Lee

Join Date: Apr 2019
Posts: 45

select the first or the last group

16 Feb 2022, 17:36

Dear all,
How can I write code more efficiently to select the first group or the last group membership?
I use the following codes with 3 lines, but I know there should be an efficient way. I tried to apply 'by' but couldn't figure out. Thank you.
C

encode COHORT, gen(COHORT_1)
egen COHORT_2 = max(COHORT_1)
keep if COHORT_1 == COHORT_2

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double PERSON_ID str6 COHORT
517877 "201010"
510879 "201010"
512590 "201010"
515317 "201010"
512580 "201010"
531253 "201110"
531378 "201110"
527816 "201110"
531807 "201110"
524803 "201110"
538907 "201210"
539477 "201210"
540091 "201210"
539153 "201210"
543484 "201210"
550003 "201310"
551269 "201310"
549953 "201310"
549951 "201310"
549942 "201310"
560820 "201410"
562103 "201410"
563341 "201410"
562101 "201410"
563991 "201410"
574394 "201510"
569987 "201510"
569827 "201510"
572599 "201510"
568758 "201510"
585164 "201610"
578954 "201610"
585001 "201610"
577872 "201610"
587184 "201610"
594510 "201710"
592563 "201710"
594477 "201710"
594469 "201710"
593787 "201710"
603141 "201810"
611437 "201810"
614263 "201810"
606238 "201810"
605326 "201810"
621624 "201910"
628749 "201910"
629139 "201910"
622690 "201910"
621377 "201910"
631157 "202010"
639737 "202010"
631058 "202010"
639695 "202010"
641433 "202010"
652750 "202110"
647773 "202110"
645486 "202110"
652190 "202110"
647151 "202110"
660184 "202210"
662295 "202210"
654938 "202210"
665859 "202210"
655894 "202210"
end

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 9945

16 Feb 2022, 18:47

These look like year-month dates stored as strings. While there will be no conflict in sorting them alphabetically, it is advisable to convert them to SIF values.

Code:

gen yearmonth= ym(real(substr(COHORT, 1, 4)), real(substr(COHORT, -2, 2)))
format yearmonth %tm

To your question:

Code:

sort yearmonth
keep if yearmonth== yearmonth[_N]

Res.:

Code:

. l

     +------------------------------+
     | PERSON~D   COHORT   yearmo~h |
     |------------------------------|
  1. |   660184   202210    2022m10 |
  2. |   662295   202210    2022m10 |
  3. |   654938   202210    2022m10 |
  4. |   665859   202210    2022m10 |
  5. |   655894   202210    2022m10 |
     +------------------------------+

Comment

Roman Mostazir

Join Date: Apr 2014

Posts: 868
#3

16 Feb 2022, 18:49

Originally posted by Chul Lee View Post

Dear all,
How can I write code more efficiently to select the first group or the last group membership?

It is not clear what your are asking. Keeping either first or the last group is not difficult (OR am I missing something?). You just select the group you want to keep. Or did you mean you want to keep both first and the last? See below the code for all three options (keeping the 1st, keeping the last, keeping the first and last):

Code:

encode COHORT, gen(COHORT_1) tab COHORT_1, nol //see the values of the cohort_1 keep if COHORT_1 ==1 //Keep the first group only keep if COHORT_1 ==13 //Keep the last group only keep if COHORT_1 == 1 COHORT_1 ==13 //Keep both the first group and the last group

By the way, your code #1 is keeping only the last group.

PS: cross posted with Andrew.

Last edited by Roman Mostazir; 16 Feb 2022, 18:51. Reason: Added PS

Roman
Comment
Chul Lee

Join Date: Apr 2019

Posts: 45
#4

16 Feb 2022, 18:58

Andrew,
Thank you as always. Although 'COHORT' is academic term code being used in Banner system and converting to 'yearmontth' is not necessary, your code is perfectly working here. I appreciate it.

keep if COHORT == COHORT[_N]

Additional question: would you also teach me how to select the first group membership?
C
Comment
Chul Lee

Join Date: Apr 2019

Posts: 45
#5

16 Feb 2022, 19:02

@Roman,
yes, I can apply 'keep' after running tab. The idea is that I need to insert 'keep' in a long script without running 'tab'. Also, the data set is updated each time when I run the script.
Thank you.
C
Comment
Chul Lee

Join Date: Apr 2019

Posts: 45
#6

16 Feb 2022, 19:15

to select the first group:
Actually, I changed Andrew's advice above and got what I need.

sort COHORT
keep if COHORT == COHORT[1]

Thank you.
C
Comment

Announcement

select the first or the last group

Comment

Comment

Comment

Comment

Comment