I am working with a dataset that contains addresses in Armenian. The house number field is usually numeric, but can also have letters / words before or after the number (the equivalent of 14A or Lower 16). When the data goes into Stata, the Armenian characters become symbols. I am trying to figure out a way to extract just the numbers from the string because I need to search another larger dataset for the nearest "whole number" address. The characters appear in different parts of the field and the numbers are different lengths, so a standard substring won't work. I know there is a way to do this. Any suggestions most welcome.
Here is an example of what the data looks like:
var1
14 ³
15·
¹6
16 ѳñ³í
Here is an example of what the data looks like:
var1
14 ³
15·
¹6
16 ѳñ³í
Comment