Hello all,
I have a large panel data set with a considerable amount of duplicates (impossible to assess by hand). The reason for this duplicates to exist is not clear to me: I have downloaded the data straight from Compustat / WRDS.
What I get is that I have duplicate observations by year and id, with one of the duplicate observations missing values where the other does not:
Is there a way I can have Stata drop the observations with more missing values?
Thank you in advance.
I have a large panel data set with a considerable amount of duplicates (impossible to assess by hand). The reason for this duplicates to exist is not clear to me: I have downloaded the data straight from Compustat / WRDS.
What I get is that I have duplicate observations by year and id, with one of the duplicate observations missing values where the other does not:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float year long id double(at ceq csho) 1991 3514 714.709 76.064 . 1991 3514 714.709 76.064 2.4 1992 3514 824.676 92.352 . 1992 3514 824.676 92.352 2.394 1993 3514 1010.104 116.991 . 1993 3514 1010.104 116.991 2.708 1994 3514 1038.882 125.177 . 1994 3514 1038.882 125.177 6.628 end label values id id label def id 3514 "WSBC", modify
Thank you in advance.
Comment