Dear Experts,
I am conducting a survey analysis and facing challenges with making case IDs uniquely identifiable. I hope someone can assist me with it.
I have household survey data with more than 9000 observations and my ID variable is caseid. v001 is the cluster number, v133 is education in single years, and so on. After carefully looking at the data, I noticed some minor differences in a few repeated caseids that are confusing. While most responses are the same across caseids, I noticed that the variable
had different responses between repeated caseids, and the variable
follows a similar pattern.
I've followed the following stages to resolve the problem.
1. I sorted the data using the following code:
2. then I tested for unique identification using
and Stata reported that
[CODE]
3. The trouble begins when I run a duplicate report using
and
. At this point, Stata produce results suggesting no duplicates (if I am correct).
copies observations surplus
1 9733 0
4. But when I run Duplicates in terms of caseid using the code
, I obtain the following results suggesting that there are duplicate IDs. My concern now is on how to proceed.
5. Now, I try to drop duplicate IDS
, but get results that
6. What I've done is flag the duplicate IDs as suggested by one member in this forum
. My data example is shown below:
Thank you in advance!
I am conducting a survey analysis and facing challenges with making case IDs uniquely identifiable. I hope someone can assist me with it.
I have household survey data with more than 9000 observations and my ID variable is caseid. v001 is the cluster number, v133 is education in single years, and so on. After carefully looking at the data, I noticed some minor differences in a few repeated caseids that are confusing. While most responses are the same across caseids, I noticed that the variable
Code:
bord
b11
I've followed the following stages to resolve the problem.
1. I sorted the data using the following code:
Code:
sort caseid
Code:
isid caseid
Code:
variable caseid does not uniquely identify the observations
3. The trouble begins when I run a duplicate report using
Code:
duplicates report
Code:
duplicates list
copies observations surplus
1 9733 0
4. But when I run Duplicates in terms of caseid using the code
Code:
duplicates report caseid
copies | observations | surplus |
1 | 3719 | 0 |
2 | 4518 | 2259 |
3 | 1347 | 898 |
4 | 124 | 93 |
5 | 25 | 20 |
5. Now, I try to drop duplicate IDS
Code:
duplicates drop
(0 observations are duplicates)
Code:
duplicates tag caseid, gen(flag)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str15 caseid int v001 byte(v101 v133 v012 v714 hw1 bord v190) int hw70 byte(v152 v632) int b11 byte flag " 1 125 2" 1 5 9 31 1 22 4 2 -255 47 . 62 0 " 1 125 3" 1 5 0 19 0 . 1 2 . 47 . . 0 " 1 147 2" 1 5 10 22 1 . 1 4 . 38 . . 0 " 1 191 3" 1 5 1 20 0 . 1 3 . 30 . . 0 " 1 198 1" 1 5 6 26 1 50 4 3 22 26 . 22 0 " 1 252 2" 1 5 0 37 0 2 8 3 -292 51 . 34 1 " 1 252 2" 1 5 0 37 0 36 7 3 -405 51 . 37 1 " 1 310 3" 1 5 6 15 0 . 1 2 . 50 . . 0 " 1 334 2" 1 5 5 36 1 16 9 2 -214 45 . 46 0 " 1 819 1" 1 5 4 41 1 . 4 3 . 41 . 28 1 " 1 819 1" 1 5 4 41 1 . 3 3 . 41 . 97 1 " 1 897 1" 1 5 0 40 0 2 13 3 -106 40 . 36 1 " 1 897 1" 1 5 0 40 0 39 12 3 -471 40 . 28 1 " 11040 1" 1 5 3 32 0 . 2 3 . 32 . 115 0 " 11052 2" 1 5 3 16 0 11 1 2 -185 25 . . 0 " 11103 2" 1 5 0 38 1 . 7 2 . 45 . 33 0 " 11128 2" 1 5 0 26 1 3 4 3 75 37 . 27 1 " 11128 2" 1 5 0 26 1 31 3 3 -181 37 . 39 1 " 11204 2" 1 5 6 27 1 5 5 2 -153 28 . 29 2 " 11204 2" 1 5 6 27 1 34 4 2 -68 28 . 22 2 " 11204 2" 1 5 6 27 1 56 3 2 -20 28 . 11 2 " 2 16 5" 2 4 8 18 0 . 2 3 . 56 . 31 1 " 2 16 5" 2 4 8 18 0 . 1 3 . 56 . . 1 " 2 16 7" 2 4 6 36 1 . 6 3 . 56 . 42 0 " 2 75 2" 2 4 4 38 1 . 5 2 . 44 . 27 1 " 2 75 2" 2 4 4 38 1 . 4 2 . 44 . 69 1 " 2 191 1" 2 4 6 33 1 . 5 2 . 30 1 44 0 " 2 240 4" 2 4 6 22 1 15 1 1 -345 80 . . 0 " 2 431 2" 2 4 6 38 1 . 6 3 . 52 . 14 1 " 2 431 2" 2 4 6 38 1 . 5 3 . 52 . 30 1 " 2 432 3" 2 4 5 25 1 1 6 2 -47 48 . 41 2 " 2 432 3" 2 4 5 25 1 . 5 2 . 48 . 23 2 " 2 432 3" 2 4 5 25 1 43 4 2 -515 48 . 23 2 " 2 608 2" 2 4 6 35 1 41 5 2 -37 32 . 39 0 " 2 681 3" 2 4 9 23 1 30 2 2 -120 56 3 44 0 " 2 689 2" 2 4 6 28 1 6 6 1 116 38 . 30 1 " 2 689 2" 2 4 6 28 1 . 5 1 . 38 . 30 1 " 2 757 3" 2 4 6 35 1 1 7 3 -257 85 . 31 1 " 2 757 3" 2 4 6 35 1 32 6 3 -483 85 . 30 1 " 2 757 12" 2 4 6 27 1 17 4 3 -145 85 1 23 1 " 2 757 12" 2 4 6 27 1 41 3 3 21 85 1 42 1 " 2 757 18" 2 4 4 23 1 25 3 3 -367 85 . 27 1 " 2 757 18" 2 4 4 23 1 53 2 3 -289 85 . 30 1 " 2 757 22" 2 4 9 22 1 1 4 3 -353 85 . 32 1 " 2 757 22" 2 4 9 22 1 33 3 3 -403 85 . 28 1 " 2 764 2" 2 4 7 23 1 . 4 3 . 30 . 27 2 " 2 764 2" 2 4 7 23 1 . 3 3 . 30 . 24 2 " 2 764 2" 2 4 7 23 1 . 2 3 . 30 . 26 2 " 2 905 2" 2 4 11 24 1 . 1 3 . 35 . . 0 " 21024 2" 2 4 2 32 1 . 7 2 . 48 . 52 0 " 21138 2" 2 4 5 33 1 . 7 1 . 35 . 43 1 " 21138 2" 2 4 5 33 1 . 6 1 . 35 . 27 1 " 21138 9" 2 4 5 25 1 . 5 1 . 35 . 27 2 " 21138 9" 2 4 5 25 1 . 4 1 . 35 . 24 2 " 21138 9" 2 4 5 25 1 . 3 1 . 35 . 26 2 " 3 62 3" 3 2 4 40 1 . 10 3 . 64 . 24 1 " 3 62 3" 3 2 4 40 1 . 9 3 . 64 . 40 1 " 3 191 2" 3 2 8 25 1 13 4 3 -177 26 . 30 1 " 3 191 2" 3 2 8 25 1 43 3 3 9998 26 . 33 1 " 3 368 5" 3 2 7 27 1 32 4 3 102 37 3 34 0 " 3 368 9" 3 2 13 27 0 14 2 3 -139 37 3 22 1 " 3 368 9" 3 2 13 27 0 37 1 3 -81 37 3 . 1 " 3 369 2" 3 2 1 45 1 17 10 2 -295 47 3 70 0 " 3 369 3" 3 2 5 18 0 30 1 2 -214 47 . . 0 " 3 478 2" 3 2 8 29 1 30 4 4 -260 45 . 23 1 " 3 478 2" 3 2 8 29 1 53 3 4 -308 45 . 34 1 " 3 478 8" 3 2 14 28 0 . 1 4 . 45 . . 0 " 3 552 2" 3 2 6 40 1 29 7 3 -46 58 . 84 0 " 3 552 3" 3 2 7 18 0 19 1 3 -253 58 . . 0 " 3 665 2" 3 2 12 23 1 . 1 3 . 22 3 . 0 " 3 712 6" 3 2 8 27 1 . 2 5 . 53 . . 1 " 3 712 6" 3 2 8 27 1 . 1 5 . 53 . . 1 " 3 715 12" 3 2 3 30 1 41 3 4 -1 57 . 74 0 " 3 724 9" 3 2 5 43 1 . 8 2 . 48 . 48 0 " 3 756 5" 3 2 6 24 1 34 3 2 -236 49 . 42 0 " 3 814 4" 3 2 10 17 0 . 1 3 . 54 . . 0 " 3 814 11" 3 2 11 22 0 . 1 3 . 54 . . 0 " 3 845 3" 3 2 5 27 1 . 4 3 . 55 . 27 1 " 3 845 3" 3 2 5 27 1 . 3 3 . 55 . 28 1 " 3 879 3" 3 2 11 22 0 32 1 3 6 49 . . 0 " 3 896 3" 3 2 6 21 1 18 2 3 -155 70 . 23 1 " 3 896 3" 3 2 6 21 1 42 1 3 -138 70 . . 1 " 3 896 6" 3 2 6 25 1 27 3 3 -117 70 . 48 0 " 3 932 7" 3 2 6 33 1 . 1 5 . 43 . . 0 " 3 932 8" 3 2 9 26 1 . 1 5 . 43 . . 0 " 3 957 2" 3 2 8 20 1 7 1 4 157 42 3 . 0 " 31005 2" 3 2 6 31 1 . 3 3 . 35 . 47 0 " 31005 8" 3 2 8 26 1 . 4 3 . 35 . 29 1 " 31005 8" 3 2 8 26 1 . 3 3 . 35 . 53 1 " 31007 3" 3 2 7 22 1 9 3 3 6 53 . 16 1 " 31007 3" 3 2 7 22 1 . 2 3 . 53 . 47 1 " 31007 4" 3 2 7 19 1 . 2 3 . 53 . 31 1 " 31007 4" 3 2 7 19 1 49 1 3 -166 53 . . 1 " 31038 3" 3 2 11 19 1 . 1 3 . 56 3 . 0 " 31038 4" 3 2 9 29 1 . 3 3 . 56 3 58 0 " 31038 11" 3 2 6 26 1 . 1 3 . 56 . . 0 " 31038 12" 3 2 7 19 0 . 1 3 . 56 . . 0 " 31038 13" 3 2 7 34 1 . 6 3 . 56 . 66 1 " 31038 13" 3 2 7 34 1 . 5 3 . 56 . 66 1 " 31072 2" 3 2 9 23 1 . 1 3 . 39 . . 0 end label values v101 V101 label def V101 2 "centre (without yaounde)", modify label def V101 4 "east", modify label def V101 5 "far-north", modify label values v133 V133 label values v714 V714 label def V714 0 "no", modify label def V714 1 "yes", modify label values v190 V190 label def V190 1 "poorest", modify label def V190 2 "poorer", modify label def V190 3 "middle", modify label def V190 4 "richer", modify label def V190 5 "richest", modify label values hw70 HW70 label def HW70 9998 "flagged cases", modify label values v152 V152 label values v632 V632 label def V632 1 "mainly respondent", modify label def V632 3 "joint decision", modify
Comment