Removing accent signs while retaining letters

Christiaan de Swardt

Join Date: Oct 2024

Posts: 5
#1

Removing accent signs while retaining letters

20 Nov 2024, 09:11

Dear Stata community

I have a string variable that consists of names with accent signs (special characters). For example, it would include names like "Côte d'Ivoire" and "Guinée". The problem is, I need to use this string variable to merge my dataset with another dataset, which contains the names without accent signs (so, just "Cote d'Ivoire" and "Guinee", for example).

Does anyone know of some command that can 'strip out' the accent signs from my first string variable while retaining the underlying letter, such that it is possible to merge with the string variable not containing accent signs?

Thank you so much in advance for any kind advice!
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 9773

20 Nov 2024, 09:58

Code:

clear
input str29 countryname
"Côte d'Ivoire" 
"Guinée
end

gen wanted= ustrto(ustrnormalize(countryname, "nfd"), "ascii", 2)

Res.:

Code:

. l

     +-------------------------------+
     |   countryname          wanted |
     |-------------------------------|
  1. | Côte d'Ivoire   Cote d'Ivoire |
  2. |        Guinée          Guinee |
     +-------------------------------+

For country names, other issues beyond accents can arise, particularly when combining different datasets. The kountry command from SSC is quite helpful in addressing these challenges.

Code:

ssc install kountry, replace
help kountry

Announcement

Removing accent signs while retaining letters

Comment