Hello everyone,
I have a large dataset with many duplicates across different variables. I am trying to find what are the differences within the duplicates. I came across the obsdiff ( by Eric Booth) command and tried using it but I am not sure how to specify within duplicates not rows.
I have created a sample table similar to what I am trying to do. I need to find the differences in DOB, nationality , gender and result within duplicates of ID .
For examples : what are the differences in DOB, nationality , gender and result within duplicates of ID 1 ?
The code I have been using is :
obsdiff DOB Nationality age gender result , row (1/15).
I want to do the same command but within duplicates of each ID without listing the rows for each group of ID duplicates ( my original dataset has millions of duplicates) .
Is this possible within this command ?
Thank you !
Heba
I have a large dataset with many duplicates across different variables. I am trying to find what are the differences within the duplicates. I came across the obsdiff ( by Eric Booth) command and tried using it but I am not sure how to specify within duplicates not rows.
I have created a sample table similar to what I am trying to do. I need to find the differences in DOB, nationality , gender and result within duplicates of ID .
For examples : what are the differences in DOB, nationality , gender and result within duplicates of ID 1 ?
ID | DOB | Nationality | age | gender | result |
1 | 1996 | Jordan | 25 | F | P |
1 | 1996 | Jordan | 25 | F | P |
1 | 1996 | Egypt | 25 | F | P |
1 | 1997 | Egypt | 25 | F | N |
1 | 1997 | Jordan | 25 | F | N |
1 | 1996 | Qatar | 24 | F | P |
2 | 1995 | Lebanon | 12 | M | N |
2 | 1995 | Lebanon | 12 | M | N |
2 | 1995 | Lebanon | 14 | M | P |
2 | 1995 | Lebanon | 11 | M | P |
2 | 1995 | Lebanon | 12 | M | P |
3 | 1998 | Syria | 21 | F | N |
4 | 1996 | Syria | 22 | F | P |
5 | 2000 | Qatar | 23 | F | N |
obsdiff DOB Nationality age gender result , row (1/15).
I want to do the same command but within duplicates of each ID without listing the rows for each group of ID duplicates ( my original dataset has millions of duplicates) .
Is this possible within this command ?
Thank you !
Heba
Comment