I have a few hundred thousand address records that I need to clean, and part of my cleaning process is replacing all zero characters (0) with the letter O in words. But being addresses, this isn't a case for a simple subinstr command. So I'm trying to figure out how to use regular expressions to do this. I've tried the regex and ustrregex commands, but so far no luck. So I'm hoping someone here can help me.
For example, take the following two addresses:
In the first example, I want to replace the zeroes in C0NC0RD but not the zero in 101.
In the second example, I want to replace the zero in S0UTH but not in 30.
I tried using ustregrexra two different ways, but didn't get the right results:
resulted in
resulted in
Normally I would use ustrregexs to concatenate together elements of a match, but these are dynamic matches and I can't predict how many times the zeroes will appear.
I also tried jregex after reading this post (https://www.statalist.org/forums/for...ar-expressions), but I don't have enough experience with Java regular expressions to make this work. I used this command:
That replaced the correct zeroes with Os but, as with the ustrregexra command, removed the characters on either side.
Any help would be appreciated.
For example, take the following two addresses:
Code:
101 C0NC0RD AVE 123 HIGHWAY 30 S0UTH
In the second example, I want to replace the zero in S0UTH but not in 30.
I tried using ustregrexra two different ways, but didn't get the right results:
Code:
replace address = ustrregexra(address, "[A-Z]0[A-Z]", "O")
Code:
101 OOD AVE 123 HIGHWAY 30 OTH
Code:
replace address = ustrregexra(address, "[A-Z]0[A-Z]", "[A-Z]O[A-Z]")
Code:
101 [A-Z]O[A-Z][A-Z]O[A-Z]D AVE 123 HIGHWAY 30 [A-Z]O[A-Z]TH
I also tried jregex after reading this post (https://www.statalist.org/forums/for...ar-expressions), but I don't have enough experience with Java regular expressions to make this work. I used this command:
Code:
jregex replace address, pattern("[A-Z\s]0[A-Z]") rep("O")
Any help would be appreciated.
Comment