I have a data set in which there are several thousand hundred strings under the variable "birth_city" to clean.
Although each cell is different from each other, in fact, the stem is the same city--Houston.
So, the expected result is to replace all strings under "birth_city" with "Houston,TX,USA"
*The dataset is listed as below for the purpose of illustration,
clear
input int uid str13 birth_city
11 "Houston"
14 "HOUSTON"
12 "huoston"
16 "h ous t o n"
17 "h_u_ostno"
19 "houst"
24 "houston harris"
18 "hisuton"
15 "harris_ohuston"
10 "houston,texas"
25 "houston,tx"
20 "houston,HaRris"
end
Thanks for your kindly help!
Although each cell is different from each other, in fact, the stem is the same city--Houston.
So, the expected result is to replace all strings under "birth_city" with "Houston,TX,USA"
*The dataset is listed as below for the purpose of illustration,
clear
input int uid str13 birth_city
11 "Houston"
14 "HOUSTON"
12 "huoston"
16 "h ous t o n"
17 "h_u_ostno"
19 "houst"
24 "houston harris"
18 "hisuton"
15 "harris_ohuston"
10 "houston,texas"
25 "houston,tx"
20 "houston,HaRris"
end
Thanks for your kindly help!
Comment