Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Identify Invisible Characters in Strings?

    Dear Statalists,
    I collected data from the internet. One variable looks identical to my eyes but appears differently in Stata 16 SE. As shown in the picture below, both strings are "waterproofing_materials", but one has 23 in length, the other has 26. I guess some invisible characters exist there, but I could not figure out what they are yet.
    I tried replacing char(1) to char(40), but it does not make a difference. I uploaded the dta and excel file as attachments. Any suggestions would be appreciated.
    Best,
    Kailin



    Click image for larger version

Name:	_20220607155616.png
Views:	1
Size:	53.4 KB
ID:	1668151
    Attached Files

  • #2
    Sorry, I just found out the solution! It seems that I could not delete the post myself.
    Happy to share the solution in case anyone is interested:

    Code:
    charlist
    return lsit
    disp r(ascii)
    *95 97 101 102 103 105 108 109 110 111 112 114 115 116 119 187 191 239 
    replace ind3=subinstr(ind3,char(187),"",.)
    replace ind3=subinstr(ind3,char(191),"",.)
    replace ind3=subinstr(ind3,char(239),"",.)

    Comment

    Working...
    X