Hello Statlisters,
I am stuck trying to merge a household data file and a community data file on Stata using the LSMS dataset. The problem is both files have different unique identifiers, for the household data file, the unique identifier is hhid and indiv, while in the community file, it is cluster_id. I used the following codes to merge m:1.
use "/Users/nunuy/Desktop/nunuy 🧸/sectc2_infra.dta", clear
duplicates report cluster_id
duplicates list cluster_id
duplicates drop cluster_id, force
save comm_infra
use hhdatafile1
gen lga_s=string(lga)
gen ea_s=string(ea)
gen cluster_id=lga_s+"-"+ea_s
sort cluster_id
merge m:1 cluster_id using "/Users/nunuy/Desktop/nunuy🧸/comm_infra.dta"
The merge process did not work because the unique identifier “does not uniquely identify observations in the using data.” Alternatively, I am aware that m:m is not the ideal merge type for household and community data files, but when I tried m:m, it worked correctly. Please, is there something I am missing? and what is the best way to go around this?
I look forward to your response.
I am stuck trying to merge a household data file and a community data file on Stata using the LSMS dataset. The problem is both files have different unique identifiers, for the household data file, the unique identifier is hhid and indiv, while in the community file, it is cluster_id. I used the following codes to merge m:1.
use "/Users/nunuy/Desktop/nunuy 🧸/sectc2_infra.dta", clear
duplicates report cluster_id
duplicates list cluster_id
duplicates drop cluster_id, force
save comm_infra
use hhdatafile1
gen lga_s=string(lga)
gen ea_s=string(ea)
gen cluster_id=lga_s+"-"+ea_s
sort cluster_id
merge m:1 cluster_id using "/Users/nunuy/Desktop/nunuy🧸/comm_infra.dta"
The merge process did not work because the unique identifier “does not uniquely identify observations in the using data.” Alternatively, I am aware that m:m is not the ideal merge type for household and community data files, but when I tried m:m, it worked correctly. Please, is there something I am missing? and what is the best way to go around this?
I look forward to your response.
Comment