One of the main job responsibilities in my workplace is to clean very messy survey data. Any survey data that I clean always has many numeric variables such as income, consumption expenditure, debts etc. However when I first upload these data in to Stata, it cannot read those variables as numeric, because a lot of survey participants types non-numeric responses (such as N/A, adds $ sign, adds comma, provides range of values, types k instead of thousand, spells numbers instead of typing it such as four instead of 4 etc.), which I later have to clean manually using the replace command. I have recently come across the concept of creating custom Stata command and planning to create my own to reduce my workload. However, I am a bit disoriented about from where should I start learning about this and how to better implement my ideas into the command. Would really appreciate if anyone could direct me to any resources or share their experience about how to clean raw and messy survey data in more efficient way in Stata.
-
Login or Register
- Log in with
Comment