As part of a data cleaning exercise, I'm trying to identify any observations in a large dataset that have some variables with values that don't have a corresponding value label. For instance, I have the following value label applied to many variables.
YesNo:
0 No
1 Yes
66 Refused
99 DontKnow
I'm concerned that in data entry, some values between 1 and 66 (or between 66 and 99) may have been entered as typos. I'd like to write a program that does the following:
YesNo:
0 No
1 Yes
66 Refused
99 DontKnow
I'm concerned that in data entry, some values between 1 and 66 (or between 66 and 99) may have been entered as typos. I'd like to write a program that does the following:
- Loops over all variables
- For each variable (e.g., foo), checks if it has a value label
- If foo has a value label, extract the "in-range" numlist from it (e.g., 0 1 66 99)
- Create a new dummy variable foo_out that is 1 for any observations that have an out-of range value for foo (e.g. 67) and 0 for all others.