Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to delete rows based on a specific condition?

    I have the following student and grade data. I want to create a Marks column but I have conditions for selecting the marks of those subjects which the student repeated.

    The condition is
    For a given student, if the subject is repeated, I want to only consider the 2nd Attempt even if marks in the second Attempt can be lowered than 1st Attempt. This subject could be repeated in the same academic year or the next academic year. I have given a sample for 1 student but I want to do this for 1000+ students.
    How to create the correct code for the following?

    1. How to create the 'Repeated Attempts' column?
    2. How to create 'Marks to be considered' column and drop the rows in the 'Marks to be considered' columns which has the value 'Not considered'?
    Row Student ID Year Subject Grade Repeated Attempts Marks to be considered
    1 ID1 2010 Maths 58 No repeat 58
    2 ID1 2010 Eng 35 1st Attempt Not considered
    3 ID1 2010 Eng 48 2nd Attempt 48
    4 ID1 2010 Science 32 1st Attempt Not considered
    5 ID1 2011 Science 50 2nd Attempt 50
    6 ID1 2011 Maths-II 53 No repeat 53
    7 ID1 2011 Eng-II 45 No repeat 45
    8 ID1 2011 Sci-I 52 No repeat 52
    9 ID1 2012 Maths-III 56 No repeat 56
    10 ID1 2012 Eng-III 49 No repeat 49
    11 ID1 2012 Sci-III 38 1st Attempt Not considered
    12 ID1 2013 Sci-III 34 2nd Attempt 34

  • #2
    I am going to assume that the row number reflects chronological order within a student id, subject, and year. Otherwise, there is no apparent way to tell which of two attempts at the same course in the same year is the first and which is the repeat. On that assumption, you can do this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte row str4 student_id int year str10 subject byte grade
     1 "ID1 " 2010 "Maths "     58
     2 "ID1 " 2010 "Eng "       35
     3 "ID1 " 2010 "Eng "       48
     4 "ID1 " 2010 "Science "   32
     5 "ID1 " 2011 "Science "   50
     6 "ID1 " 2011 "Maths-II "  53
     7 "ID1 " 2011 "Eng-II "    45
     8 "ID1 " 2011 "Sci-I "     52
     9 "ID1 " 2012 "Maths-III " 56
    10 "ID1 " 2012 "Eng-III "   49
    11 "ID1 " 2012 "Sci-III "   38
    12 "ID1 " 2013 "Sci-III "   34
    end
    
    label define attempts    0    "No repeat" ///
                            1    "Early attempt"    ///
                            2    "Final attempt"
    by student_id subject (year row), sort: gen byte attempts = 0 if _N == 1
    by student_id subject (year row): replace attempts = 1 if _N > 1 & _n < _N
    by student_id subject (year row): replace attempts = 2 if _N > 1 & _n == _N
    label value attempts attempts
    
    label define consideration .n "Not considered"
    gen marks_to_be_considered:consideration = cond(attempt != 1, grade, .n)
    
    sort student_id row
    Added: In principle (though it doesn't happen in your data example) I see no reason why the same student might not attempt the same course more than twice. So this code assumes that in such a situation you wish to consider only the last attempt, whether that be the second, third, or whatever. I have similar marked the attempts variable to distinguish only between no repeat, early attempts, and a single final attempt.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Let's use Stata terms in a Stata forum. For row read observation and for column read variable.

      More importantly, this may help. Wrapping marks you want to work with later in a string variable will prove awkward, so I have given another version.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte row str3 studentid int year str9 subject byte grade str11 repeatedattempts str14 markstobeconsidered
       1 "ID1" 2010 "Maths"     58 "No repeat"   "58"            
       2 "ID1" 2010 "Eng"       35 "1st Attempt" "Not considered"
       3 "ID1" 2010 "Eng"       48 "2nd Attempt" "48"            
       4 "ID1" 2010 "Science"   32 "1st Attempt" "Not considered"
       5 "ID1" 2011 "Science"   50 "2nd Attempt" "50"            
       6 "ID1" 2011 "Maths-II"  53 "No repeat"   "53"            
       7 "ID1" 2011 "Eng-II"    45 "No repeat"   "45"            
       8 "ID1" 2011 "Sci-I"     52 "No repeat"   "52"            
       9 "ID1" 2012 "Maths-III" 56 "No repeat"   "56"            
      10 "ID1" 2012 "Eng-III"   49 "No repeat"   "49"            
      11 "ID1" 2012 "Sci-III"   38 "1st Attempt" "Not considered"
      12 "ID1" 2013 "Sci-III"   34 "2nd Attempt" "34"            
      end
      
      bysort studentid subject (row) : gen wanted1 = cond(_N == 1, "No repeat", word("1st 2nd", _n) + " Attempt")
      gen wanted2 = cond(wanted1 == "1st Attempt", "Not considered", strofreal(grade))
      gen wanted3 = grade if wanted1 != "1st Attempt"
      
      sort studentid row
      
      ds row, not
      
      list `r(varlist)', sepby(studentid year)
      
           +-------------------------------------------------------------------------------------------------------------+
           | studen~d   year     subject   grade   repeateda~s   markstobecon~d       wanted1          wanted2   wanted3 |
           |-------------------------------------------------------------------------------------------------------------|
        1. |      ID1   2010       Maths      58     No repeat               58     No repeat               58        58 |
        2. |      ID1   2010         Eng      35   1st Attempt   Not considered   1st Attempt   Not considered         . |
        3. |      ID1   2010         Eng      48   2nd Attempt               48   2nd Attempt               48        48 |
        4. |      ID1   2010     Science      32   1st Attempt   Not considered   1st Attempt   Not considered         . |
           |-------------------------------------------------------------------------------------------------------------|
        5. |      ID1   2011     Science      50   2nd Attempt               50   2nd Attempt               50        50 |
        6. |      ID1   2011    Maths-II      53     No repeat               53     No repeat               53        53 |
        7. |      ID1   2011      Eng-II      45     No repeat               45     No repeat               45        45 |
        8. |      ID1   2011       Sci-I      52     No repeat               52     No repeat               52        52 |
           |-------------------------------------------------------------------------------------------------------------|
        9. |      ID1   2012   Maths-III      56     No repeat               56     No repeat               56        56 |
       10. |      ID1   2012     Eng-III      49     No repeat               49     No repeat               49        49 |
       11. |      ID1   2012     Sci-III      38   1st Attempt   Not considered   1st Attempt   Not considered         . |
           |-------------------------------------------------------------------------------------------------------------|
       12. |      ID1   2013     Sci-III      34   2nd Attempt               34   2nd Attempt               34        34 |
           +-------------------------------------------------------------------------------------------------------------+
      EDIT: =2 was not visible to me when I started. I agree with all of Clyde Schechter's points, especially on use of dataex ! Please read the FAQ Advice before posting, as requested.
      Last edited by Nick Cox; 16 Dec 2024, 12:32.

      Comment

      Working...
      X