Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract name initials

    Hi. I am trying to extract name intials from a variable containing names. Since all names have their initials in capital letters, my attempt was to do the following work:

    Code:
    clear all
    
    input str13 x
        "John Smith"
        "Linda Johnson"
        "Peter B. Brown"
    end
    
    g y = regexs(0) if regexm(x,"[A-Z]")
    This way, the y variable contains, "J", "L" and "P". However, what I really want is "JS", "LJ" and "PBB".

    I appreciate any hints on how can I solve this problem.

    Thanks in advance.

  • #2
    One method would use moss from SSC.

    Code:
    clear all
    
    input str13 x
        "John Smith"
        "Linda Johnson"
        "Peter B. Brown"
    end
    
    moss x, match("([A-Z]+)") regex 
    
    egen wanted = concat(_match*)
    
    drop _*
    
    list
    
         +------------------------+
         |             x   wanted |
         |------------------------|
      1. |    John Smith       JS |
      2. | Linda Johnson       LJ |
      3. | Peter B. Brow      PBB |
         +------------------------+
    Code:
    
    

    Comment


    • #3
      Thank you!

      Comment


      • #4
        If you have a general knowledge of regular expressions (perhaps having used them in another language) then this demonstrates a solution based on eliminating every character which is not an upper-case letter, which works for your example data.
        Code:
        . generate y = ustrregexra(x,"[^A-Z]","")
        
        . list, clean
        
                           x     y  
          1.      John Smith    JS  
          2.   Linda Johnson    LJ  
          3.   Peter B. Brow   PBB
        Again, if you have experience with regular expressions that you want to build on in Stata, you will find that the Unicode regular expression functions - such as ustrregexra - introduced in Stata 14 have a much more powerful definition of regular expressions than the non-Unicode functions. To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's Unicode regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp. A comprehensive discussion of regular expressions can be found at https://www.regular-expressions.info/unicode.html.

        Comment


        • #5
          Great, William! Thank you very much.

          Comment

          Working...
          X