Hi,
I have been searching the forum and found some variations of code which I have tried to use, but I can't get it to properly keep the right observations.
I have a dataset with 3260 observations, out of which 26 are duplicates (aka 3234 distinct IDs). I have a variable, creatinin, which can contain either a value or missing.
- If the patient has 2 observations and both have missing values; I want to keep the missing value, but only one observation.
- ff the patient has 1 observation with missing and 1 with a creatinin value; I want to keep the one with the creatinin.
- If the patient has 2 observations with creatinin values, I want to keep the lowest of the two.
I have tried:
Which deleted 390 observations.
Any help would be greatly appreciated!
I have been searching the forum and found some variations of code which I have tried to use, but I can't get it to properly keep the right observations.
I have a dataset with 3260 observations, out of which 26 are duplicates (aka 3234 distinct IDs). I have a variable, creatinin, which can contain either a value or missing.
- If the patient has 2 observations and both have missing values; I want to keep the missing value, but only one observation.
- ff the patient has 1 observation with missing and 1 with a creatinin value; I want to keep the one with the creatinin.
- If the patient has 2 observations with creatinin values, I want to keep the lowest of the two.
I have tried:
Code:
bysort id_code: egen wanted=min(creatinin) gen OK = !missing(creatinin) bysort OK id_code (creatinin) : keep if OK & _n == _N
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str16 id_code double creatinin "B0020" 1.8 "B0020" 3.26 "B0021" . "B0021" 1.88 "B0045" . "B0045" . "B0077" . "B0077" 2.1 "B0081" 4.8 "B0081" . "B2002" 2.2 "B2042" 1.7 "B2042" 1.9 end
Comment