Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • select the first or the last group

    Dear all,
    How can I write code more efficiently to select the first group or the last group membership?
    I use the following codes with 3 lines, but I know there should be an efficient way. I tried to apply 'by' but couldn't figure out. Thank you.
    C


    encode COHORT, gen(COHORT_1)
    egen COHORT_2 = max(COHORT_1)
    keep if COHORT_1 == COHORT_2


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double PERSON_ID str6 COHORT
    517877 "201010"
    510879 "201010"
    512590 "201010"
    515317 "201010"
    512580 "201010"
    531253 "201110"
    531378 "201110"
    527816 "201110"
    531807 "201110"
    524803 "201110"
    538907 "201210"
    539477 "201210"
    540091 "201210"
    539153 "201210"
    543484 "201210"
    550003 "201310"
    551269 "201310"
    549953 "201310"
    549951 "201310"
    549942 "201310"
    560820 "201410"
    562103 "201410"
    563341 "201410"
    562101 "201410"
    563991 "201410"
    574394 "201510"
    569987 "201510"
    569827 "201510"
    572599 "201510"
    568758 "201510"
    585164 "201610"
    578954 "201610"
    585001 "201610"
    577872 "201610"
    587184 "201610"
    594510 "201710"
    592563 "201710"
    594477 "201710"
    594469 "201710"
    593787 "201710"
    603141 "201810"
    611437 "201810"
    614263 "201810"
    606238 "201810"
    605326 "201810"
    621624 "201910"
    628749 "201910"
    629139 "201910"
    622690 "201910"
    621377 "201910"
    631157 "202010"
    639737 "202010"
    631058 "202010"
    639695 "202010"
    641433 "202010"
    652750 "202110"
    647773 "202110"
    645486 "202110"
    652190 "202110"
    647151 "202110"
    660184 "202210"
    662295 "202210"
    654938 "202210"
    665859 "202210"
    655894 "202210"
    end

  • #2
    These look like year-month dates stored as strings. While there will be no conflict in sorting them alphabetically, it is advisable to convert them to SIF values.

    Code:
    gen yearmonth= ym(real(substr(COHORT, 1, 4)), real(substr(COHORT, -2, 2)))
    format yearmonth %tm
    To your question:

    Code:
    sort yearmonth
    keep if yearmonth== yearmonth[_N]
    Res.:

    Code:
    . l
    
         +------------------------------+
         | PERSON~D   COHORT   yearmo~h |
         |------------------------------|
      1. |   660184   202210    2022m10 |
      2. |   662295   202210    2022m10 |
      3. |   654938   202210    2022m10 |
      4. |   665859   202210    2022m10 |
      5. |   655894   202210    2022m10 |
         +------------------------------+

    Comment


    • #3
      Originally posted by Chul Lee View Post
      Dear all,
      How can I write code more efficiently to select the first group or the last group membership?
      It is not clear what your are asking. Keeping either first or the last group is not difficult (OR am I missing something?). You just select the group you want to keep. Or did you mean you want to keep both first and the last? See below the code for all three options (keeping the 1st, keeping the last, keeping the first and last):

      Code:
      encode COHORT, gen(COHORT_1)
      
      tab COHORT_1, nol  //see the values of the cohort_1
      
      keep if COHORT_1 ==1 //Keep the first group only
      
      keep if COHORT_1 ==13 //Keep the last group only
      
      keep if COHORT_1 == 1 COHORT_1 ==13 //Keep both the first group and the last group
      By the way, your code #1 is keeping only the last group.

      PS: cross posted with Andrew.
      Last edited by Roman Mostazir; 16 Feb 2022, 18:51. Reason: Added PS
      Roman

      Comment


      • #4
        Andrew,
        Thank you as always. Although 'COHORT' is academic term code being used in Banner system and converting to 'yearmontth' is not necessary, your code is perfectly working here. I appreciate it.

        keep if COHORT == COHORT[_N]

        Additional question: would you also teach me how to select the first group membership?
        C

        Comment


        • #5
          @Roman,
          yes, I can apply 'keep' after running tab. The idea is that I need to insert 'keep' in a long script without running 'tab'. Also, the data set is updated each time when I run the script.
          Thank you.
          C

          Comment


          • #6
            to select the first group:
            Actually, I changed Andrew's advice above and got what I need.

            sort COHORT
            keep if COHORT == COHORT[1]

            Thank you.
            C

            Comment

            Working...
            X