Dear statalisters,
I have a string variable (stringvar) in the following format:
joe bloggs 10/03/1987
jamie-lee cyrus 2/12/1982
cameron reece jones aka smith 03/02/1961
michelle simone peters-smith 16/8/1952
The first portion of the variable is the person’s name, and the second is their date of birth. I have successfully extracted the date of birth (dob) using the following code:
gen dob = regexs(0) if(regexm(stringvar, "[0-9]*[/][0-9]*[/][0-9]*"))
I would like to extract the person’s first name (retaining hyphenation), middle and surnames (also retaining hyphenation), and also identify words that come after “aka” as this denotes former (e.g. maiden) names.
I can extract the first name using:
gen firstname = regexs(0) if(regexm(stringvar, "([a-z]+)[ ]*"))
but this doesn’t retain hyphenation – I only get the first part of a hyphenated name. Using the following code, e.g.
gen fourthname = regexs(4) if(regexm(stringvar, "([a-z]+)[ ]*([a-z]+)[ ]*([a-z]+)[ ]*([a-z]+)"))
returns fourthname as the final character of the last name for names with fewer than four words, e.g. fourthname==”s” for joe bloggs.
I am using Stata SE 13.0 for Windows. Any help is much appreciated.
Thank you,
Claudia.
I have a string variable (stringvar) in the following format:
joe bloggs 10/03/1987
jamie-lee cyrus 2/12/1982
cameron reece jones aka smith 03/02/1961
michelle simone peters-smith 16/8/1952
The first portion of the variable is the person’s name, and the second is their date of birth. I have successfully extracted the date of birth (dob) using the following code:
gen dob = regexs(0) if(regexm(stringvar, "[0-9]*[/][0-9]*[/][0-9]*"))
I would like to extract the person’s first name (retaining hyphenation), middle and surnames (also retaining hyphenation), and also identify words that come after “aka” as this denotes former (e.g. maiden) names.
I can extract the first name using:
gen firstname = regexs(0) if(regexm(stringvar, "([a-z]+)[ ]*"))
but this doesn’t retain hyphenation – I only get the first part of a hyphenated name. Using the following code, e.g.
gen fourthname = regexs(4) if(regexm(stringvar, "([a-z]+)[ ]*([a-z]+)[ ]*([a-z]+)[ ]*([a-z]+)"))
returns fourthname as the final character of the last name for names with fewer than four words, e.g. fourthname==”s” for joe bloggs.
I am using Stata SE 13.0 for Windows. Any help is much appreciated.
Thank you,
Claudia.
Comment