Extract name initials

Vinicius Lima

Join Date: Dec 2019

Posts: 10
#1

Extract name initials

17 Dec 2020, 11:41

Hi. I am trying to extract name intials from a variable containing names. Since all names have their initials in capital letters, my attempt was to do the following work:

Code:

clear all input str13 x "John Smith" "Linda Johnson" "Peter B. Brown" end g y = regexs(0) if regexm(x,"[A-Z]")

This way, the y variable contains, "J", "L" and "P". However, what I really want is "JS", "LJ" and "PBB".

I appreciate any hints on how can I solve this problem.

Thanks in advance.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35438

17 Dec 2020, 11:55

One method would use moss from SSC.

Code:

clear all

input str13 x
    "John Smith"
    "Linda Johnson"
    "Peter B. Brown"
end

moss x, match("([A-Z]+)") regex 

egen wanted = concat(_match*)

drop _*

list

     +------------------------+
     |             x   wanted |
     |------------------------|
  1. |    John Smith       JS |
  2. | Linda Johnson       LJ |
  3. | Peter B. Brow      PBB |
     +------------------------+

Code:

Comment

Vinicius Lima

Join Date: Dec 2019

Posts: 10
#3

17 Dec 2020, 12:55

Thank you!
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

17 Dec 2020, 13:44

If you have a general knowledge of regular expressions (perhaps having used them in another language) then this demonstrates a solution based on eliminating every character which is not an upper-case letter, which works for your example data.

Code:

. generate y = ustrregexra(x,"[^A-Z]","") . list, clean x y 1. John Smith JS 2. Linda Johnson LJ 3. Peter B. Brow PBB

Again, if you have experience with regular expressions that you want to build on in Stata, you will find that the Unicode regular expression functions - such as ustrregexra - introduced in Stata 14 have a much more powerful definition of regular expressions than the non-Unicode functions. To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's Unicode regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp. A comprehensive discussion of regular expressions can be found at https://www.regular-expressions.info/unicode.html.
2 likes
Comment
Vinicius Lima

Join Date: Dec 2019

Posts: 10
#5

17 Dec 2020, 14:57

Great, William! Thank you very much.
Comment

Announcement

Extract name initials

Comment

Comment

Comment

Comment