How to find data points that are out-of-range of their value labels?

Katriel Friedman

Join Date: Nov 2014

Posts: 14
#1

How to find data points that are out-of-range of their value labels?

15 May 2017, 21:06

As part of a data cleaning exercise, I'm trying to identify any observations in a large dataset that have some variables with values that don't have a corresponding value label. For instance, I have the following value label applied to many variables.

YesNo:
0 No
1 Yes
66 Refused
99 DontKnow

I'm concerned that in data entry, some values between 1 and 66 (or between 66 and 99) may have been entered as typos. I'd like to write a program that does the following:
Loops over all variables

For each variable (e.g., foo), checks if it has a value label

If foo has a value label, extract the "in-range" numlist from it (e.g., 0 1 66 99)

Create a new dummy variable foo_out that is 1 for any observations that have an out-of range value for foo (e.g. 67) and 0 for all others.

But I don't know how to do steps 2, 3, or 4. Any advice on useful commands or pointers to existing tools would be enormously helpful. Thank you!
Tags: categorical, label, numlist, value label

Announcement