Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find data points that are out-of-range of their value labels?

    As part of a data cleaning exercise, I'm trying to identify any observations in a large dataset that have some variables with values that don't have a corresponding value label. For instance, I have the following value label applied to many variables.

    YesNo:
    0 No
    1 Yes
    66 Refused
    99 DontKnow


    I'm concerned that in data entry, some values between 1 and 66 (or between 66 and 99) may have been entered as typos. I'd like to write a program that does the following:
    1. Loops over all variables
    2. For each variable (e.g., foo), checks if it has a value label
    3. If foo has a value label, extract the "in-range" numlist from it (e.g., 0 1 66 99)
    4. Create a new dummy variable foo_out that is 1 for any observations that have an out-of range value for foo​​​ (e.g. 67) and 0 for all others.
    But I don't know how to do steps 2, 3, or 4. Any advice on useful commands or pointers to existing tools would be enormously helpful. Thank you!
Working...
X