Hi,
I am using Stata 14.1 SE to gather demographic data on 6 cohorts of college students. To do this, I loop over administrative records for each cohort, and save a temp file with these data for each cohort. In this records, it is likely that a student has more than observations, which means s/he attended more than one institution that year. Because I only want one record containing demographic information for one student, I run two sets of commands.
The first command identifies the student's primary institution by calculating the total number of credit hours taken that semester, and eliminating observations where the student took credit hours less than the maximum amount that semester. The primary institution is the institution where the student took the highest number of credit hours.
Command Line 1: by id: egen maxsch = max(sch)
Command Line 2: keep if maxsch == sch
After running these two commands, I reckon with students who took the exact number of credits in two more institutions, and have not been properly dealt with using the previous set of commands.
I run the following command to keep the first observation for all students, effectively eliminating any observations after the first one.
Command Line 3: bys id sch: keep if _n==1
The problem starts after I run these commands. Next, I keep students who are defined as first-time college students. Each time I run this command, Stata keeps a different number of first-time college students, even though it deletes the same number of observations in Command Line 2 and Command Line 3. I have pasted output to help illustrate my problem more clearly.
Output 1 - First run
(1,122,521 observations deleted) - Result from Command Line 1
(327,317 observations deleted) - Result from Command Line 2
(1,091,298 observations deleted) - Result from Command Line 3
Output 2 - Second run
(1,122,521 observations deleted)- Result from Command Line 1
(327,317 observations deleted) - Result from Command Line 2
(1,091,654 observations deleted) - Result from Command Line 3
What it seems like Stata is doing is that it is deleting the same number of observations in Command Lines 1 and 2, just that the students are different. Is my assumption correct? If this is indeed the case, would it be possible to tell Stata to keep the same types of students across runs?
Any help is indeed appreciated.
Thanks!
I am using Stata 14.1 SE to gather demographic data on 6 cohorts of college students. To do this, I loop over administrative records for each cohort, and save a temp file with these data for each cohort. In this records, it is likely that a student has more than observations, which means s/he attended more than one institution that year. Because I only want one record containing demographic information for one student, I run two sets of commands.
The first command identifies the student's primary institution by calculating the total number of credit hours taken that semester, and eliminating observations where the student took credit hours less than the maximum amount that semester. The primary institution is the institution where the student took the highest number of credit hours.
Command Line 1: by id: egen maxsch = max(sch)
Command Line 2: keep if maxsch == sch
After running these two commands, I reckon with students who took the exact number of credits in two more institutions, and have not been properly dealt with using the previous set of commands.
I run the following command to keep the first observation for all students, effectively eliminating any observations after the first one.
Command Line 3: bys id sch: keep if _n==1
The problem starts after I run these commands. Next, I keep students who are defined as first-time college students. Each time I run this command, Stata keeps a different number of first-time college students, even though it deletes the same number of observations in Command Line 2 and Command Line 3. I have pasted output to help illustrate my problem more clearly.
Output 1 - First run
(1,122,521 observations deleted) - Result from Command Line 1
(327,317 observations deleted) - Result from Command Line 2
(1,091,298 observations deleted) - Result from Command Line 3
Output 2 - Second run
(1,122,521 observations deleted)- Result from Command Line 1
(327,317 observations deleted) - Result from Command Line 2
(1,091,654 observations deleted) - Result from Command Line 3
What it seems like Stata is doing is that it is deleting the same number of observations in Command Lines 1 and 2, just that the students are different. Is my assumption correct? If this is indeed the case, would it be possible to tell Stata to keep the same types of students across runs?
Any help is indeed appreciated.
Thanks!
Comment