Custom Stata command for cleaning messy data

Joana Halder

Join Date: Nov 2023

Posts: 1
#1

Custom Stata command for cleaning messy data

02 Nov 2023, 22:30

One of the main job responsibilities in my workplace is to clean very messy survey data. Any survey data that I clean always has many numeric variables such as income, consumption expenditure, debts etc. However when I first upload these data in to Stata, it cannot read those variables as numeric, because a lot of survey participants types non-numeric responses (such as N/A, adds $ sign, adds comma, provides range of values, types k instead of thousand, spells numbers instead of typing it such as four instead of 4 etc.), which I later have to clean manually using the replace command. I have recently come across the concept of creating custom Stata command and planning to create my own to reduce my workload. However, I am a bit disoriented about from where should I start learning about this and how to better implement my ideas into the command. Would really appreciate if anyone could direct me to any resources or share their experience about how to clean raw and messy survey data in more efficient way in Stata.

Last edited by Joana Halder; 02 Nov 2023, 22:33.
Tags: program, string, syntax
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

03 Nov 2023, 08:29

You've accidentally posted your question in the section on Stata's matrix language, "Mata," which almost certainly is not relevant to your problem. You'd have much better luck posting to the "regular" Stata section.

That being said, I'd first point out that the -destring- command has the capacity to ignore things such as the "$" sign. So, if you haven't consulted -help destring-, that would be a good thing to read. A second point would be that the nature of the kind of standardized command you would need would depend on which particular kinds of errors in the input you want to correct. I'd say that a first step for you would be to make a comprehensive list of the common input errors you want to detect and handle. If you could repost in the regular Stata section, and include a list of such errors, that might make it possible for someone to create a short Stata program ("custom Stata command") that would handle many of the kinds of problems you encounter. -help program- would lead you into information about creating programs in Stata, but I think it's premature to think about learning that *unless* you already have some relatively canned Stata syntax you have been using to clean your data.
1 like
Comment

Announcement

Custom Stata command for cleaning messy data

Comment