Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing accent signs while retaining letters

    Dear Stata community

    I have a string variable that consists of names with accent signs (special characters). For example, it would include names like "Côte d'Ivoire" and "Guinée". The problem is, I need to use this string variable to merge my dataset with another dataset, which contains the names without accent signs (so, just "Cote d'Ivoire" and "Guinee", for example).

    Does anyone know of some command that can 'strip out' the accent signs from my first string variable while retaining the underlying letter, such that it is possible to merge with the string variable not containing accent signs?

    Thank you so much in advance for any kind advice!

  • #2
    Code:
    clear
    input str29 countryname
    "Côte d'Ivoire" 
    "Guinée
    end
    
    gen wanted= ustrto(ustrnormalize(countryname, "nfd"), "ascii", 2)
    Res.:

    Code:
    . l
    
         +-------------------------------+
         |   countryname          wanted |
         |-------------------------------|
      1. | Côte d'Ivoire   Cote d'Ivoire |
      2. |        Guinée          Guinee |
         +-------------------------------+
    For country names, other issues beyond accents can arise, particularly when combining different datasets. The kountry command from SSC is quite helpful in addressing these challenges.

    Code:
    ssc install kountry, replace
    help kountry

    Comment

    Working...
    X