Hi all, I have a very large dataset of 970,000 observations, this dataset was given to be an organisation.
I tried to merge this dataset with another which came back with the error
stata does not uniquely identify observations in the master data
Which I figured it it has to do with my ID variable. I checked for any missing in both the master and merge file which there are none.
I then checked for duplicates as I figured out this would be the only other reason. (Although in none of my code have I myself introduced any duplicates)
I tried duplicates report

I then tried to list the duplicates of course there were too many.
I then tried codebook - as you can see the unique values here differ.

My question: Why does codebook show different number of unique values to the duplicates report which shows there are 959,798 unique values.
I tried to merge this dataset with another which came back with the error
stata does not uniquely identify observations in the master data
Which I figured it it has to do with my ID variable. I checked for any missing in both the master and merge file which there are none.
I then checked for duplicates as I figured out this would be the only other reason. (Although in none of my code have I myself introduced any duplicates)
I tried duplicates report
I then tried to list the duplicates of course there were too many.
I then tried codebook - as you can see the unique values here differ.
My question: Why does codebook show different number of unique values to the duplicates report which shows there are 959,798 unique values.
Comment