hello Statalisters
I am working with household survey dataset where each row is a subject/observation. In one of the data columns I have an entry with the spouse ID. I want to match couples and characteristics (highest education, gross income, occupational code for primary occupation, perceived health status, to name a few). I created a temporary file where the spousal pid variable is renamed to pid, and the substantive variables are also renamed, and to merge it back into the original data. however i am getting an error message variable pid does not uniquely identify observations in the master data. What can be the problem? If there are polygamous couples in the data how do i detect them?
* Example generated by -dataex-. For more info, type help dataex
clear
input long pid byte(w1_r_edu w1_r_marstt) long w1_r_parhpid byte w1_a_em1occ_isco_c long w1_a_em1inc byte w1_a_hldes
301015 10 1 301043 2 9000 2
301020 6 1 301036 . . 4
301025 12 1 301029 4 -8 1
301026 9 1 301045 . . 4
301027 8 2 501102 . . 1
301029 12 1 301025 5 1500 1
301034 12 1 301035 . . 3
301035 8 1 301034 . . 2
301036 6 1 301020 . . 4
301040 8 1 301017 . . 3
301043 19 1 301015 5 11100 5
301045 10 1 301026 . . 3
301048 12 1 301056 3 -3 3
301052 7 1 301065 . . 4
301056 12 1 301048 2 5800 2
301057 3 1 301058 . . 5
301058 0 1 301057 . . 3
301062 10 1 . . . 1
301064 10 1 501104 . . 1
301065 7 1 301052 . . 4
301067 7 1 310050 . . 1
301073 10 1 301077 8 2000 1
301077 7 1 301073 9 855 3
301079 6 1 301255 . . 3
301080 25 1 301081 . . 1
301081 25 1 301080 . . 4
301084 12 2 301086 4 3150 1
301086 9 2 301084 . . 3
301088 8 1 301092 7 1500 2
301090 10 1 301091 . . 1
301091 6 1 301090 -8 -8 1
301092 12 1 301088 . . 4
301095 10 1 301703 . . 3
301100 11 1 301701 8 2700 1
301102 11 1 501197 . . 1
301104 9 2 302664 . . 4
301105 12 1 301106 2 13768 3
301106 10 1 301105 5 13000 3
301112 22 1 302668 2 -9 1
301117 12 1 309121 2 -8 3
301123 12 2 501467 . . 2
301124 11 1 301125 6 4000 4
301125 7 1 301124 . . 4
301128 17 1 301181 2 7000 3
301134 12 2 301865 7 1995 1
301139 12 1 301151 . . 3
301142 10 1 301147 . . 1
301147 12 1 301142 5 7900 1
301151 19 1 301139 2 9779 4
301154 10 1 301192 . . 1
301156 11 1 301159 . . 5
301157 6 1 301167 . . 5
301164 19 2 301158 2 8000 1
301167 4 1 301157 5 2000 4
301172 20 2 301173 2 9500 1
301173 20 2 301172 2 9500 4
301175 10 1 301176 . . 5
301176 6 1 301175 . . 5
301181 22 1 301128 2 11797 3
301182 12 2 501113 2 2280 2
301196 11 2 301859 7 4000 3
301197 12 1 301198 2 -8 3
301202 11 2 301941 8 5000 1
301204 10 1 301208 5 -3 2
301208 12 1 301204 . . 2
301210 5 2 307343 . . 5
301215 12 1 501292 2 2000 4
301218 22 1 730044 2 28000 2
301219 5 1 . 9 500 1
301220 11 2 301222 . . 2
301222 11 2 301220 . . 2
301223 6 1 301226 . . 5
301224 7 2 301243 7 520 2
301228 11 1 301230 . . 2
301229 25 1 . 7 3000 2
301230 8 1 301228 7 1300 2
301231 8 1 . 7 1200 5
301232 11 1 . 7 3800 2
301235 10 2 301248 8 2000 2
301238 12 1 301239 . . 2
301239 16 1 301238 5 2500 2
301243 10 2 301224 . . 5
301244 9 2 301245 . . 5
301245 10 2 301244 5 1100 3
301247 25 2 301272 . . 5
301248 10 2 301235 . . 5
301253 25 1 501555 5 -9 3
301255 11 1 301079 . . 4
301257 8 2 305515 7 1300 1
301259 11 1 . 8 2700 2
301264 5 1 . . . 2
301269 10 1 . 7 1100 2
301272 6 2 301247 . . 2
301274 12 1 . 9 -9 1
301280 5 1 301961 . . 5
301282 7 1 301294 . . 3
301285 10 2 301947 . . 2
301286 11 2 301287 . . 2
301287 12 2 301286 8 5000 4
301288 25 1 570442 . . 4
end
label values w1_r_edu w1_edgrd
label def w1_edgrd 0 "Grade R/0", modify
label def w1_edgrd 3 "Grade 3/Std. 1", modify
label def w1_edgrd 4 "Grade 4/Std. 2", modify
label def w1_edgrd 5 "Grade 5/Std. 3", modify
label def w1_edgrd 6 "Grade 6/Std. 4", modify
label def w1_edgrd 7 "Grade 7/Std. 5", modify
label def w1_edgrd 8 "Grade 8/Std. 6/Form 1", modify
label def w1_edgrd 9 "Grade 9/Std. 7/Form 2", modify
label def w1_edgrd 10 "Grade 10/Std. 8/Form 3", modify
label def w1_edgrd 11 "Grade 11/Std. 9/Form 4", modify
label def w1_edgrd 12 "Grade 12/Std. 10/Form 5/Matric/Senior Certificate", modify
label def w1_edgrd 16 "Certificate with less than Grade 12/Std. 10", modify
label def w1_edgrd 17 "Diploma with less than Grade 12/Std. 10", modify
label def w1_edgrd 19 "Diploma with Grade 12/Std. 10", modify
label def w1_edgrd 20 "Bachelors degree", modify
label def w1_edgrd 22 "Honours degree", modify
label def w1_edgrd 25 "No Schooling", modify
label values w1_r_marstt w1_marstat
label def w1_marstat 1 "Married", modify
label def w1_marstat 2 "Living with partner", modify
label values w1_r_parhpid w1_parhpid
label values w1_a_em1occ_isco_c w1_occ_code
label def w1_occ_code -8 "Refused", modify
label def w1_occ_code 2 "Professionals", modify
label def w1_occ_code 3 "Technicians and associate professionals", modify
label def w1_occ_code 4 "Clerical support workers", modify
label def w1_occ_code 5 "Service and sales workers", modify
label def w1_occ_code 6 "Skilled agricultural, forestry and fishery workers", modify
label def w1_occ_code 7 "Craft and related trades workers", modify
label def w1_occ_code 8 "Plant and machine operators, and assemblers", modify
label def w1_occ_code 9 "Elementary occupations", modify
label values w1_a_em1inc w1_nonres
label def w1_nonres -9 "Don't know", modify
label def w1_nonres -8 "Refused", modify
label def w1_nonres -3 "Missing", modify
label values w1_a_hldes w1_hldes
label def w1_hldes 1 "Excellent", modify
label def w1_hldes 2 "Very Good", modify
label def w1_hldes 3 "Good", modify
label def w1_hldes 4 "Fair", modify
label def w1_hldes 5 "Poor", modify
code
tempfile original
. save original
file original.dta saved
. drop pid
. rename w1_r_parhpid pid
. rename w1* sp_w1*
. merge 1:1 pid using original
variable pid does not uniquely identify observations in the master data
r(459);
I am working with household survey dataset where each row is a subject/observation. In one of the data columns I have an entry with the spouse ID. I want to match couples and characteristics (highest education, gross income, occupational code for primary occupation, perceived health status, to name a few). I created a temporary file where the spousal pid variable is renamed to pid, and the substantive variables are also renamed, and to merge it back into the original data. however i am getting an error message variable pid does not uniquely identify observations in the master data. What can be the problem? If there are polygamous couples in the data how do i detect them?
* Example generated by -dataex-. For more info, type help dataex
clear
input long pid byte(w1_r_edu w1_r_marstt) long w1_r_parhpid byte w1_a_em1occ_isco_c long w1_a_em1inc byte w1_a_hldes
301015 10 1 301043 2 9000 2
301020 6 1 301036 . . 4
301025 12 1 301029 4 -8 1
301026 9 1 301045 . . 4
301027 8 2 501102 . . 1
301029 12 1 301025 5 1500 1
301034 12 1 301035 . . 3
301035 8 1 301034 . . 2
301036 6 1 301020 . . 4
301040 8 1 301017 . . 3
301043 19 1 301015 5 11100 5
301045 10 1 301026 . . 3
301048 12 1 301056 3 -3 3
301052 7 1 301065 . . 4
301056 12 1 301048 2 5800 2
301057 3 1 301058 . . 5
301058 0 1 301057 . . 3
301062 10 1 . . . 1
301064 10 1 501104 . . 1
301065 7 1 301052 . . 4
301067 7 1 310050 . . 1
301073 10 1 301077 8 2000 1
301077 7 1 301073 9 855 3
301079 6 1 301255 . . 3
301080 25 1 301081 . . 1
301081 25 1 301080 . . 4
301084 12 2 301086 4 3150 1
301086 9 2 301084 . . 3
301088 8 1 301092 7 1500 2
301090 10 1 301091 . . 1
301091 6 1 301090 -8 -8 1
301092 12 1 301088 . . 4
301095 10 1 301703 . . 3
301100 11 1 301701 8 2700 1
301102 11 1 501197 . . 1
301104 9 2 302664 . . 4
301105 12 1 301106 2 13768 3
301106 10 1 301105 5 13000 3
301112 22 1 302668 2 -9 1
301117 12 1 309121 2 -8 3
301123 12 2 501467 . . 2
301124 11 1 301125 6 4000 4
301125 7 1 301124 . . 4
301128 17 1 301181 2 7000 3
301134 12 2 301865 7 1995 1
301139 12 1 301151 . . 3
301142 10 1 301147 . . 1
301147 12 1 301142 5 7900 1
301151 19 1 301139 2 9779 4
301154 10 1 301192 . . 1
301156 11 1 301159 . . 5
301157 6 1 301167 . . 5
301164 19 2 301158 2 8000 1
301167 4 1 301157 5 2000 4
301172 20 2 301173 2 9500 1
301173 20 2 301172 2 9500 4
301175 10 1 301176 . . 5
301176 6 1 301175 . . 5
301181 22 1 301128 2 11797 3
301182 12 2 501113 2 2280 2
301196 11 2 301859 7 4000 3
301197 12 1 301198 2 -8 3
301202 11 2 301941 8 5000 1
301204 10 1 301208 5 -3 2
301208 12 1 301204 . . 2
301210 5 2 307343 . . 5
301215 12 1 501292 2 2000 4
301218 22 1 730044 2 28000 2
301219 5 1 . 9 500 1
301220 11 2 301222 . . 2
301222 11 2 301220 . . 2
301223 6 1 301226 . . 5
301224 7 2 301243 7 520 2
301228 11 1 301230 . . 2
301229 25 1 . 7 3000 2
301230 8 1 301228 7 1300 2
301231 8 1 . 7 1200 5
301232 11 1 . 7 3800 2
301235 10 2 301248 8 2000 2
301238 12 1 301239 . . 2
301239 16 1 301238 5 2500 2
301243 10 2 301224 . . 5
301244 9 2 301245 . . 5
301245 10 2 301244 5 1100 3
301247 25 2 301272 . . 5
301248 10 2 301235 . . 5
301253 25 1 501555 5 -9 3
301255 11 1 301079 . . 4
301257 8 2 305515 7 1300 1
301259 11 1 . 8 2700 2
301264 5 1 . . . 2
301269 10 1 . 7 1100 2
301272 6 2 301247 . . 2
301274 12 1 . 9 -9 1
301280 5 1 301961 . . 5
301282 7 1 301294 . . 3
301285 10 2 301947 . . 2
301286 11 2 301287 . . 2
301287 12 2 301286 8 5000 4
301288 25 1 570442 . . 4
end
label values w1_r_edu w1_edgrd
label def w1_edgrd 0 "Grade R/0", modify
label def w1_edgrd 3 "Grade 3/Std. 1", modify
label def w1_edgrd 4 "Grade 4/Std. 2", modify
label def w1_edgrd 5 "Grade 5/Std. 3", modify
label def w1_edgrd 6 "Grade 6/Std. 4", modify
label def w1_edgrd 7 "Grade 7/Std. 5", modify
label def w1_edgrd 8 "Grade 8/Std. 6/Form 1", modify
label def w1_edgrd 9 "Grade 9/Std. 7/Form 2", modify
label def w1_edgrd 10 "Grade 10/Std. 8/Form 3", modify
label def w1_edgrd 11 "Grade 11/Std. 9/Form 4", modify
label def w1_edgrd 12 "Grade 12/Std. 10/Form 5/Matric/Senior Certificate", modify
label def w1_edgrd 16 "Certificate with less than Grade 12/Std. 10", modify
label def w1_edgrd 17 "Diploma with less than Grade 12/Std. 10", modify
label def w1_edgrd 19 "Diploma with Grade 12/Std. 10", modify
label def w1_edgrd 20 "Bachelors degree", modify
label def w1_edgrd 22 "Honours degree", modify
label def w1_edgrd 25 "No Schooling", modify
label values w1_r_marstt w1_marstat
label def w1_marstat 1 "Married", modify
label def w1_marstat 2 "Living with partner", modify
label values w1_r_parhpid w1_parhpid
label values w1_a_em1occ_isco_c w1_occ_code
label def w1_occ_code -8 "Refused", modify
label def w1_occ_code 2 "Professionals", modify
label def w1_occ_code 3 "Technicians and associate professionals", modify
label def w1_occ_code 4 "Clerical support workers", modify
label def w1_occ_code 5 "Service and sales workers", modify
label def w1_occ_code 6 "Skilled agricultural, forestry and fishery workers", modify
label def w1_occ_code 7 "Craft and related trades workers", modify
label def w1_occ_code 8 "Plant and machine operators, and assemblers", modify
label def w1_occ_code 9 "Elementary occupations", modify
label values w1_a_em1inc w1_nonres
label def w1_nonres -9 "Don't know", modify
label def w1_nonres -8 "Refused", modify
label def w1_nonres -3 "Missing", modify
label values w1_a_hldes w1_hldes
label def w1_hldes 1 "Excellent", modify
label def w1_hldes 2 "Very Good", modify
label def w1_hldes 3 "Good", modify
label def w1_hldes 4 "Fair", modify
label def w1_hldes 5 "Poor", modify
code
tempfile original
. save original
file original.dta saved
. drop pid
. rename w1_r_parhpid pid
. rename w1* sp_w1*
. merge 1:1 pid using original
variable pid does not uniquely identify observations in the master data
r(459);
Comment