Hi everyone,
I'm working with vaccination data and I have loads of duplicates, and need to get rid of them. I want to give priority to the duplicate records where people are 'fully vaccinated', and drop all other obs within that group that are either 'partially vaccinated' or missing. Similarly, if a duplicate group has only 'partially vaccinated' duplicates and missing, I want to only keep one of the partials.
As a caveat, I'd like to preserve if surveysent == True and surveyresp == True for the duplicate obs (if there are any) after the above vaccination criteria are met.
This is complex and can't quite wrap my tiny brain around it. Any thoughts?
I'm working with vaccination data and I have loads of duplicates, and need to get rid of them. I want to give priority to the duplicate records where people are 'fully vaccinated', and drop all other obs within that group that are either 'partially vaccinated' or missing. Similarly, if a duplicate group has only 'partially vaccinated' duplicates and missing, I want to only keep one of the partials.
As a caveat, I'd like to preserve if surveysent == True and surveyresp == True for the duplicate obs (if there are any) after the above vaccination criteria are met.
This is complex and can't quite wrap my tiny brain around it. Any thoughts?
Code:
clear input long(vaxstat surveysent surveyresp) float group 1 2 1 25 1 1 1 25 . 2 1 25 . 1 1 25 2 2 1 24 2 1 1 24 1 2 1 24 1 1 1 24 2 2 2 19 2 1 1 19 . 2 2 19 . 1 1 19 end label values vaxstat vaxstat label def vaxstat 1 "1. Fully Vaccinated", modify label def vaxstat 2 "2. Partially Vaccinated", modify label values surveysent surveysent1 label def surveysent1 1 "1. FALSE", modify label def surveysent1 2 "2. TRUE", modify label values surveyresp surveyresp label def surveyresp 1 "1. FALSE", modify label def surveyresp 2 "2. TRUE", modify
Comment