Hello,
I am new to STATA and I am trying to figure out how to drop observations with duplicate ID numbers but instead of dropping the first occurring observation based on the variable "wave", I want to drop the latter observation.
To clarify, for my dataset below, I have ID "A00789" which appears twice in my dataset, one in 2010 and one in 2008. If I want to drop A00789's 2010 data , what syntax should I use. My observations are not currently sorted by wave.
Also, how would I loop the syntax so that I don't have to retype all my syntax everytime?
This is the process that I know but wouldn't make sense since I have thousands of duplicate observations.
1) duplicates tag ID, gen(dupID) -->to find duplicates
2) list ID if dupID==1 -->to know IDs of duplicate observations
3) tab wave if ID=="A00789" -->to know what wave the this duplicate ID have
4) dropping the latter wave for this ID -dont know the syntax
5) how to loop steps 3 and 4
Thank you!
I am new to STATA and I am trying to figure out how to drop observations with duplicate ID numbers but instead of dropping the first occurring observation based on the variable "wave", I want to drop the latter observation.
To clarify, for my dataset below, I have ID "A00789" which appears twice in my dataset, one in 2010 and one in 2008. If I want to drop A00789's 2010 data , what syntax should I use. My observations are not currently sorted by wave.
Also, how would I loop the syntax so that I don't have to retype all my syntax everytime?
This is the process that I know but wouldn't make sense since I have thousands of duplicate observations.
1) duplicates tag ID, gen(dupID) -->to find duplicates
2) list ID if dupID==1 -->to know IDs of duplicate observations
3) tab wave if ID=="A00789" -->to know what wave the this duplicate ID have
4) dropping the latter wave for this ID -dont know the syntax
5) how to loop steps 3 and 4
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str6 ID str3(AGE SEX) str4 WAVE "A00789" "12" "F" "2010" "A22254" "24" "M" "2008" "V55555" "56" "M" "2007" "C25546" "12" "F" "2006" "D02453" "34" "F" "2005" "A00789" "10" "F" "2008" end
Thank you!
Comment