Can anyone recommend an approach to comparing two datasets when the n is different?
I have over 60 datasets with a lot of the same variables across the files. I have reduced these to 27 datasets using the dta_equal command and merging after dealing with any discrepancies. I also looked at the cf command but neither of these work when the n is different in the two files. I am going to try merging some files (and only save the merged file if _merge =1-3) but I have already come across several files where _merge=5 and I'm not sure how to approach this. If it was just a few variables I would rename the variables in one file with suffix _1 or similar, merge the files and then compare the variables directly but I have around 4000 variables and 20000 observations. I realise this is a somewhat clumsy approach so any suggestions would be welcomed.
Regards
Laura
I have over 60 datasets with a lot of the same variables across the files. I have reduced these to 27 datasets using the dta_equal command and merging after dealing with any discrepancies. I also looked at the cf command but neither of these work when the n is different in the two files. I am going to try merging some files (and only save the merged file if _merge =1-3) but I have already come across several files where _merge=5 and I'm not sure how to approach this. If it was just a few variables I would rename the variables in one file with suffix _1 or similar, merge the files and then compare the variables directly but I have around 4000 variables and 20000 observations. I realise this is a somewhat clumsy approach so any suggestions would be welcomed.
Regards
Laura
Comment