I have a dataset containing mailing addresses, which included all address information in a single cell. I have parsed out city, state, and zipcode into distinct variables, but now I am looking for an efficient way to remove the city name from the address variable.
Here's an example of what I mean:
v1 was the original variable - with a comma separating only the state and zipcode from the rest of the address
v2 - v5 are variables I created by parsing on the comma and extracting the state and zipcode.
Is there a simple way to remove the characters in v2 that exactly match characters in v3 - giving me a clean street address without the city name?
Thank you!
Here's an example of what I mean:
v1 | v2 | v3 | v4 | v5 |
112 W MAIN DR CHICAGO, TX 71102 | 112 W MAIN DR CHICAGO | CHICAGO | TX | 71102 |
4596 EAST ST. WEST SAN DIEGO, MA 98550 | 4596 EAST ST. WEST SAN DIEGO | SAN DIEGO | MA | 98550 |
v2 - v5 are variables I created by parsing on the comma and extracting the state and zipcode.
Is there a simple way to remove the characters in v2 that exactly match characters in v3 - giving me a clean street address without the city name?
Thank you!
Comment