Problems when using destring command for long ID numbers

Anjali Kajan

Join Date: Mar 2022

Posts: 36
#1

Problems when using destring command for long ID numbers

30 Mar 2022, 06:19

Hi

My ID variable came as a long string in my dataset and when I use the destring command in Stata, I end up with a lot of duplicates due to the way the ID is labelled.
E.g. one person's ID is 56606 730 5 12
and the other's is 56606 730 51 2
But Stata has labelled them both as 56,606,730,512 and therefore I end up with duplicates for all cases like this. How do I avoid it doing this as it has duplicated around 2000 observations.

This is my code for how I used the destring command:

Code:

replace idhspid=subinstr(idhspid," ","",.) destring idhspid, replace format idhspid %25.0gc
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

30 Mar 2022, 06:33

If you are using destring, you want the strings to be interpreted literally as numbers and both are the same ignoring the spaces. If this is not what you want, use encode or

Code:

clear input str30 id "56606 730 5 12" "56606 730 51 2" end egen long nid= group(id)

Res.:

Code:

. l +----------------------+ | id nid | |----------------------| 1. | 56606 730 5 12 1 | 2. | 56606 730 51 2 2 | +----------------------+
1 like
Comment
Anjali Kajan

Join Date: Mar 2022

Posts: 36
#3

30 Mar 2022, 14:53

That works thank you!
Comment

Announcement