Dear Stata list,
We are working on a households survey.
Initially interviewers enter the name of districts themselves but we later realize that there are a lot of inconsistencies because same district can be written differently by interviewers we then preload all the district within the sampled areas.
But this leave us with the initial data that contains a lot of typos.
I therefore soliciting your support on how to clean up the name of districts that contain Non-ASCII and special characters.
we want to use the following rule in dealing with the situation:
.
our aim is to isolate all district names that are not captured base on the above rules.
we are using Stata 15 MP.
Thanks in anticipation of you support
We are working on a households survey.
Initially interviewers enter the name of districts themselves but we later realize that there are a lot of inconsistencies because same district can be written differently by interviewers we then preload all the district within the sampled areas.
But this leave us with the initial data that contains a lot of typos.
I therefore soliciting your support on how to clean up the name of districts that contain Non-ASCII and special characters.
we want to use the following rule in dealing with the situation:
- Only character A-Z are allow
- No training/leading or embedded spaces are allowed
- A singly space, dash(-) or underscore can be used to separate compound words.
UNG-BAKO-À.,
K-YAMMA-À
JAURO-SULEI-(CHAKAMIDARI)
KAIKABAYAS-À.
JOSé
AîDUN-MANGWARO
etc.
K-YAMMA-À
JAURO-SULEI-(CHAKAMIDARI)
KAIKABAYAS-À.
JOSé
AîDUN-MANGWARO
etc.
our aim is to isolate all district names that are not captured base on the above rules.
we are using Stata 15 MP.
Thanks in anticipation of you support
Comment