Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems when using destring command for long ID numbers

    Hi

    My ID variable came as a long string in my dataset and when I use the destring command in Stata, I end up with a lot of duplicates due to the way the ID is labelled.
    E.g. one person's ID is 56606 730 5 12
    and the other's is 56606 730 51 2
    But Stata has labelled them both as 56,606,730,512 and therefore I end up with duplicates for all cases like this. How do I avoid it doing this as it has duplicated around 2000 observations.

    This is my code for how I used the destring command:
    Code:
    replace idhspid=subinstr(idhspid," ","",.)
    destring idhspid, replace
    format idhspid %25.0gc

  • #2
    If you are using destring, you want the strings to be interpreted literally as numbers and both are the same ignoring the spaces. If this is not what you want, use encode or

    Code:
    clear
    input str30 id
    "56606 730 5 12"
    "56606 730 51 2"
    end
    
    egen long nid= group(id)
    Res.:

    Code:
    . l
    
         +----------------------+
         |             id   nid |
         |----------------------|
      1. | 56606 730 5 12     1 |
      2. | 56606 730 51 2     2 |
         +----------------------+

    Comment


    • #3
      That works thank you!

      Comment

      Working...
      X