How to delete rows based on a specific condition?

Amruta Bagwe

Join Date: Feb 2024
Posts: 9

How to delete rows based on a specific condition?

16 Dec 2024, 11:02

I have the following student and grade data. I want to create a Marks column but I have conditions for selecting the marks of those subjects which the student repeated.

The condition is
For a given student, if the subject is repeated, I want to only consider the 2nd Attempt even if marks in the second Attempt can be lowered than 1st Attempt. This subject could be repeated in the same academic year or the next academic year. I have given a sample for 1 student but I want to do this for 1000+ students.
How to create the correct code for the following?

1. How to create the 'Repeated Attempts' column?
2. How to create 'Marks to be considered' column and drop the rows in the 'Marks to be considered' columns which has the value 'Not considered'?

Row	Student ID	Year	Subject	Grade	Repeated Attempts	Marks to be considered
1	ID1	2010	Maths	58	No repeat	58
2	ID1	2010	Eng	35	1st Attempt	Not considered
3	ID1	2010	Eng	48	2nd Attempt	48
4	ID1	2010	Science	32	1st Attempt	Not considered
5	ID1	2011	Science	50	2nd Attempt	50
6	ID1	2011	Maths-II	53	No repeat	53
7	ID1	2011	Eng-II	45	No repeat	45
8	ID1	2011	Sci-I	52	No repeat	52
9	ID1	2012	Maths-III	56	No repeat	56
10	ID1	2012	Eng-III	49	No repeat	49
11	ID1	2012	Sci-III	38	1st Attempt	Not considered
12	ID1	2013	Sci-III	34	2nd Attempt	34

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

16 Dec 2024, 11:15

I am going to assume that the row number reflects chronological order within a student id, subject, and year. Otherwise, there is no apparent way to tell which of two attempts at the same course in the same year is the first and which is the repeat. On that assumption, you can do this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte row str4 student_id int year str10 subject byte grade 1 "ID1 " 2010 "Maths " 58 2 "ID1 " 2010 "Eng " 35 3 "ID1 " 2010 "Eng " 48 4 "ID1 " 2010 "Science " 32 5 "ID1 " 2011 "Science " 50 6 "ID1 " 2011 "Maths-II " 53 7 "ID1 " 2011 "Eng-II " 45 8 "ID1 " 2011 "Sci-I " 52 9 "ID1 " 2012 "Maths-III " 56 10 "ID1 " 2012 "Eng-III " 49 11 "ID1 " 2012 "Sci-III " 38 12 "ID1 " 2013 "Sci-III " 34 end label define attempts 0 "No repeat" /// 1 "Early attempt" /// 2 "Final attempt" by student_id subject (year row), sort: gen byte attempts = 0 if _N == 1 by student_id subject (year row): replace attempts = 1 if _N > 1 & _n < _N by student_id subject (year row): replace attempts = 2 if _N > 1 & _n == _N label value attempts attempts label define consideration .n "Not considered" gen marks_to_be_considered:consideration = cond(attempt != 1, grade, .n) sort student_id row

Added: In principle (though it doesn't happen in your data example) I see no reason why the same student might not attempt the same course more than twice. So this code assumes that in such a situation you wish to consider only the last attempt, whether that be the second, third, or whatever. I have similar marked the attempts variable to distinguish only between no repeat, early attempts, and a single final attempt.

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

16 Dec 2024, 11:29

Let's use Stata terms in a Stata forum. For row read observation and for column read variable.

More importantly, this may help. Wrapping marks you want to work with later in a string variable will prove awkward, so I have given another version.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte row str3 studentid int year str9 subject byte grade str11 repeatedattempts str14 markstobeconsidered
 1 "ID1" 2010 "Maths"     58 "No repeat"   "58"            
 2 "ID1" 2010 "Eng"       35 "1st Attempt" "Not considered"
 3 "ID1" 2010 "Eng"       48 "2nd Attempt" "48"            
 4 "ID1" 2010 "Science"   32 "1st Attempt" "Not considered"
 5 "ID1" 2011 "Science"   50 "2nd Attempt" "50"            
 6 "ID1" 2011 "Maths-II"  53 "No repeat"   "53"            
 7 "ID1" 2011 "Eng-II"    45 "No repeat"   "45"            
 8 "ID1" 2011 "Sci-I"     52 "No repeat"   "52"            
 9 "ID1" 2012 "Maths-III" 56 "No repeat"   "56"            
10 "ID1" 2012 "Eng-III"   49 "No repeat"   "49"            
11 "ID1" 2012 "Sci-III"   38 "1st Attempt" "Not considered"
12 "ID1" 2013 "Sci-III"   34 "2nd Attempt" "34"            
end

bysort studentid subject (row) : gen wanted1 = cond(_N == 1, "No repeat", word("1st 2nd", _n) + " Attempt")
gen wanted2 = cond(wanted1 == "1st Attempt", "Not considered", strofreal(grade))
gen wanted3 = grade if wanted1 != "1st Attempt"

sort studentid row

ds row, not

list `r(varlist)', sepby(studentid year)

     +-------------------------------------------------------------------------------------------------------------+
     | studen~d   year     subject   grade   repeateda~s   markstobecon~d       wanted1          wanted2   wanted3 |
     |-------------------------------------------------------------------------------------------------------------|
  1. |      ID1   2010       Maths      58     No repeat               58     No repeat               58        58 |
  2. |      ID1   2010         Eng      35   1st Attempt   Not considered   1st Attempt   Not considered         . |
  3. |      ID1   2010         Eng      48   2nd Attempt               48   2nd Attempt               48        48 |
  4. |      ID1   2010     Science      32   1st Attempt   Not considered   1st Attempt   Not considered         . |
     |-------------------------------------------------------------------------------------------------------------|
  5. |      ID1   2011     Science      50   2nd Attempt               50   2nd Attempt               50        50 |
  6. |      ID1   2011    Maths-II      53     No repeat               53     No repeat               53        53 |
  7. |      ID1   2011      Eng-II      45     No repeat               45     No repeat               45        45 |
  8. |      ID1   2011       Sci-I      52     No repeat               52     No repeat               52        52 |
     |-------------------------------------------------------------------------------------------------------------|
  9. |      ID1   2012   Maths-III      56     No repeat               56     No repeat               56        56 |
 10. |      ID1   2012     Eng-III      49     No repeat               49     No repeat               49        49 |
 11. |      ID1   2012     Sci-III      38   1st Attempt   Not considered   1st Attempt   Not considered         . |
     |-------------------------------------------------------------------------------------------------------------|
 12. |      ID1   2013     Sci-III      34   2nd Attempt               34   2nd Attempt               34        34 |
     +-------------------------------------------------------------------------------------------------------------+

EDIT: =2 was not visible to me when I started. I agree with all of Clyde Schechter's points, especially on use of dataex ! Please read the FAQ Advice before posting, as requested.

Last edited by Nick Cox; 16 Dec 2024, 11:32.

Announcement

How to delete rows based on a specific condition?

Comment

Comment