Hi
My ID variable came as a long string in my dataset and when I use the destring command in Stata, I end up with a lot of duplicates due to the way the ID is labelled.
E.g. one person's ID is 56606 730 5 12
and the other's is 56606 730 51 2
But Stata has labelled them both as 56,606,730,512 and therefore I end up with duplicates for all cases like this. How do I avoid it doing this as it has duplicated around 2000 observations.
This is my code for how I used the destring command:
My ID variable came as a long string in my dataset and when I use the destring command in Stata, I end up with a lot of duplicates due to the way the ID is labelled.
E.g. one person's ID is 56606 730 5 12
and the other's is 56606 730 51 2
But Stata has labelled them both as 56,606,730,512 and therefore I end up with duplicates for all cases like this. How do I avoid it doing this as it has duplicated around 2000 observations.
This is my code for how I used the destring command:
Code:
replace idhspid=subinstr(idhspid," ","",.) destring idhspid, replace format idhspid %25.0gc
Comment