You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Yes, I did Rich, however I used the 4 commands below to remove the duplicates in my datasets.
sort id country gender + 8 other vars
quietly by id country gender + 8 other vars: gen dup = cond(_N==1,0,_n)
drop if dup>1
drop dup
I found these commands somewhere online. They work for the duplicate removal but they lead to a different total amount of obs each time I run the same .do-file. I need to remove the duplicates without getting mismatches when I use the 'cf'-command which compares a saved dataset with the dataset you currently have open. Therefore, I use the save command immediatly after the 'cf' command
Lydia, you say "yes", but you don't show anything using the "duplicates" command so I think the answer is really "no", you did not; see -help duplicates-
I see that in #20 above, Sergiy has given an example of the use of the command
Unfortunately I didn't think I would need a logfile (even though this is my first Stata project) ...otherwise I would have been able to prove that I used the -help duplicates- command a few weeks ago, when I started to use Stata for the first time. "have you tried the -duplicates- command, or looked at the help file?" So yes, I did look at the help file. So, the answer would have been yes even though I could have not tried the -duplicates- command. However, I did use that command a few weeks ago, but then I decided that I also wanted to drop the originals obs underlying the duplicates. (the first occurence) Therefore, I stopped using -duplicates drop- and started to use the following:
sort id country gender
quietly by id country gender: gen dup = cond(_N==1,0,_n)
drop if dup>0;---which leads to a different result than duplicates drop (and dup >1 of course)
drop dup
Later, I changed my mind and wanted to keep the original obs, so then I used the 4 commands mentioned in my previous post because first nothing looked wrong after I used those commands. On the contrary, they seemed to work perfectly. "duplicates drop varlistnames, force" led to the same amount of dropped duplicates. I should have switched to -duplicates drop- at that time. However, the change in the total amount of obs didn't happen immediatly after running those 4 commands. They were the cause of it, but the change happened about 15 commands thereafter.
Duplicates drop was and is the solution/answer to my question, however, I decided just before my post of 13:50 that I need to do that specific duplicates removal after a certain merge. After my previous post I checked duplicates drop again just to test if that would work without later changing the total amount of obs and it did work. However, I couldn't come online again until now.
Deleted and reposted to prevent double posting:
New problem, exact same topic
sort id country gender
quietly by id country gender: gen dup = cond(_N==1,0,_n)
drop if dup>0;---which leads to a different result than duplicates drop (and dup >1 of course)
drop dup
Is it possible to drop the same amount of obs as with the above code but then with an other code?
So a code that also drops the first occurence of duplicates?
-duplicates drop- doesn't drop these.
If I understand the question, -duplicates tag- may help you do what you want. A non-zero value on the generated variable indicates the record has a duplicate.
------------------------------------------- Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor) EMAIL: [email protected] WWW: https://www3.nd.edu/~rwilliam
Comment