Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unicode encoding _ Vietnamese letters

    Hi, this is Manny working at SNU GSIAT, Korea

    I have a dataset which includes vietnamese letters.

    So I found unicode encoding command to translate them into readable letters.

    I chose ibm-5348_P100-1997 as an encoding set, yet still there are a few more words appear to be awkward.

    Will you kindly advise me on what other options or suitable encoding set that can help me change vietnamese letters into a legible format.

    Many thanks,

    Manny

  • #2
    See Robert Picard's suggestion #5 from the following link

    https://www.statalist.org/forums/for...phabet-letters

    ADDED IN EDIT: Here is the implementation applied to the Vietnamese alphabet

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 alphabet
    "Aa"  
    "Ăă"
    "Ââ"
    "Bb"  
    "Cc"  
    "Dd"  
    "Đđ"
    "Ee"  
    "Êê"
    "Gg"  
    "Hh"  
    "Ii"  
    "Kk"  
    "Ll"  
    "Mm"  
    "Nn"  
    "Oo"  
    "Ôô"
    "Ơơ"
    "Pp"  
    "Qq"  
    "Rr"  
    "Ss"  
    "Tt"  
    "Uu"  
    "Ưư"
    "Vv"  
    "Xx"  
    "Yy"  
    end
    
    gen alphabet2 = usubinstr(alphabet,"Đđ","Dd",.)
    replace alphabet2 = ustrto(ustrnormalize(alphabet2, "nfd"), "ascii", 2)
    l, clean
    Resulting in

    Code:
    . l, clean
    
           alphabet   alphab~2  
      1.         Aa         Aa  
      2.         Ăă         Aa  
      3.         Ââ         Aa  
      4.         Bb         Bb  
      5.         Cc         Cc  
      6.         Dd         Dd  
      7.         Đđ         Dd  
      8.         Ee         Ee  
      9.         Êê         Ee  
     10.         Gg         Gg  
     11.         Hh         Hh  
     12.         Ii         Ii  
     13.         Kk         Kk  
     14.         Ll         Ll  
     15.         Mm         Mm  
     16.         Nn         Nn  
     17.         Oo         Oo  
     18.         Ôô         Oo  
     19.         Ơơ         Oo  
     20.         Pp         Pp  
     21.         Qq         Qq  
     22.         Rr         Rr  
     23.         Ss         Ss  
     24.         Tt         Tt  
     25.         Uu         Uu  
     26.         Ưư         Uu  
     27.         Vv         Vv  
     28.         Xx         Xx  
     29.         Yy         Yy
    Last edited by Andrew Musau; 12 Oct 2018, 10:53.

    Comment


    • #3
      Dear Andrew Musau
      Thank you so much for your bright solution.

      It helped me a lot.

      Best,

      Manny

      Comment

      Working...
      X