Checking and Relabeling Value Labels Across Separate Waves of Longitudinal Data

David Kapaon

Join Date: Nov 2019

Posts: 20
#1

Checking and Relabeling Value Labels Across Separate Waves of Longitudinal Data

15 Nov 2019, 14:04

Hi all – was wondering if anyone would be able to help with a complicated problem I'm having

I’m working with a longitudinal dataset that contains two waves of data – (aptly named W1 and W2). It contains just over 5000 variables (about 2500 per wave) and also just over 5000 observations total. Since it's longitudinal, most of the same questions were asked in both waves. Variables have the following naming convention: wave number + section abbreviation + question number. So the variable w1fs001 would translate to:

w1 --> Wave 1
fs -- > Food Security
001 --> question #001 within the Food Security section

While the dataset contains different types of variables (string, categorical, ordinal, nominal, dichotomous, etc.), for the purposes of this question, I’m looking at re-labeling some binary variables that are in the “YES/NO” format. Right now, there are some variables whose values are labeled “0 - YES/1 – NO” in W1, but “1 - YES/2 – NO” in W2 (or even vice versa - “1 - YES/2 – NO” in W1, or “0 - YES/1 – NO” in W2). However, regardless of whatever the labeling is in W1, I want to ‘align’ the value labels so they are consistent ACROSS waves (while not necessarily being consistent WITHIN waves). I guess stated another way, for each variable, whatever the “YES/NO” value label is in W1, I want to make sure the value label is the same for that variable’s W2 counterpart.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(w1hc001 w1hc002s5 w1hc006 w1gt001s6 w2hc001 w2hc006 w2hc002s5 w2gt001s6) 2 5 1 . 1 2 1 2 1 5 1 6 . . . . 1 5 2 . 1 2 1 2 1 5 2 . . . . . 1 5 2 6 . . . . 1 .r 1 6 . . . . 1 5 2 6 1 1 1 2 1 5 1 6 1 2 1 . 2 5 2 6 . . . . 2 5 2 6 1 2 1 2 1 . 2 6 . . . . 1 5 1 6 . . . . 1 5 1 6 . . . . 1 5 2 6 1 2 1 2 1 5 2 . 1 2 1 2 1 5 1 6 . . . . 1 5 1 . 2 2 2 2 1 5 1 6 1 1 1 2 1 5 2 6 1 2 1 2 1 5 2 . 1 2 2 2 end label values w1hc001 HAALSI_VL54F label def HAALSI_VL54F 1 "1 (YES) Yes", modify label def HAALSI_VL54F 2 "2 (NO) No", modify label values w1hc002s5 spicesoils label def spicesoils 5 "5 (Yes) Yes", modify label values w1hc006 HAALSI_VL105F label def HAALSI_VL105F 1 "1 (YES) Yes", modify label def HAALSI_VL105F 2 "2 (NO) No", modify label values w1gt001s6 oldage label def oldage 6 "6 (YES) Yes", modify label values w2hc001 YN label values w2hc006 YN label values w2hc002s5 YN label values w2gt001s6 YN label def YN 1 "Yes", modify label def YN 2 "No", modify

Two things that complicate this further though are - 1. there are hundreds of different “YES/NO” value labels that were auto-generated/assigned to variables during data collection, and despite these labels being named slightly differently (VL105F, VL54F, etc.), they all apply some type of “YES/NO” value label to variables, and 2. there are some variables that have a “YES/NO” value label assigned to them, but the label is applied to values that are not 0, 1, or 2 (ex. “Do you have a 5^th child?” - where even though the answer is a numeric “5”, the label appears as “5 – YES” indicating that the respondent does have a 5th child - see variables w1hc002s5 or w1gt001s6 in the dataex above for similar examples). Despite these being coded oddly, I still need to include them in this value label check since they still are in a "YES/NO" format.
First, is there a way to limit the dataset to only variables with the “YES/NO" format?

Second, is there any way to ‘check’ that two variables are assigned the same value label?

Third, upon checking the value labels, is there a way to assign whatever the W1 value label is, to its W2 counterpart

I was envisioning some sort of command that loops through all W1 variables, and then checks the value label against its W2 counterpart but am totally lost on how to go about executing it – (especially using extended macro functions which I’m not great with). My thought process was something like this:
Keep only those variables that have “Yes” or “No” in the value label – this would also keep those ‘oddly’ labeled variables too

Order the variables “sequentially” alternating by wave - (w1pl001, w2pl001, w1pl002, w2pl002, etc)

Then, cycle through all of the W1 variables only and put the name of each different value label in order in a local/macro

Run another loop command that cycles through each different value label checking it against each separate ‘pair’ of variables (w1pl201, w2pl201) applying whatever the W1 value label is, to the W2 variable

This is all I have so far – however the findname command keeps giving me an “invalid Syntax” error, and I can’t seem to figure out what I am typing incorrectly. I am unsure of how to order each pair of variables alternating by wave, and then check the value labels of each pair.

Code:

findname, vallabeltext(*YES* *NO*) insensitive local(VALUES) gen valuelist = "" local lcode = 0 foreach var of varlist w1* { local lcode = `lcode' + 1 local valuelist : value label `var' replace valuelist = "`valuelist'" in `lcode' }

Any insights are appreciated as I am thoroughly stumped!
Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3805

16 Nov 2019, 08:08

A lot depends on the details. This seems to work for your example data.

Code:

// define one value label for yes/no answers
label define yesno 1 "Yes" 2 "No"

// find all variables that have yes/no answers
findname , vallabeltext(*yes* *no*) insensitive
local varlist `r(varlist)'

tempvar strvar // used repeatedly in -decode- below

local i 0
foreach var of local varlist {
    // -decode- to string variable
    decode `var' , generate(`strvar')
    
    // make sure there are at most two levels (yes and no)
    tabulate `strvar'
    assert r(r) <= 2
    
    // now standardize yes/no strings
    replace `strvar' = "Yes" if strmatch(strlower(`strvar'), "*yes*")
    replace `strvar' = "No"  if strmatch(strlower(`strvar'),  "*no*")

    // backtransform to numeric variable
    // use a temporary variable here in case something goes wrong
    tempvar numvar`++i'
    encode `strvar' , generate(`numvar`i'') label(yesno)
    replace `numvar`i'' = `var' if (`var' > .) // preserve extended missings
    drop `strvar' // we re-use our string variable each iteration
}

// now all variables are successfully transformed
// make them permanent
local i 0
foreach var of local varlist {
    drop   `var' // lose original variable
    rename `numvar`++i'' `var' // and rename
}

// done

// optionally lose unused value labels
labelbook , problems
label drop `r(notused)'

list
label list

I have findname from dm0048_3 SJ 15-2.

Best
Daniel

Last edited by daniel klein; 16 Nov 2019, 08:12.

Comment

David Kapaon

Join Date: Nov 2019

Posts: 20
#3

26 Nov 2019, 13:26

Hi Daniel,
I apologize for the delay - thanks so much for your reply. Just looking over the code, this seems to be what I was looking for. However, I still can't seem to run the findname command. I've deleted, and re-downloaded the package you specified, but I get the 'invalid syntax' error whenever I run the command. I guess I'll try to look for a workaround for keeping only those variables with YES or NO in the value label until I can figure out the problem.
Thanks again for your help!
David
Comment
daniel klein

Join Date: Mar 2014

Posts: 3805
#4

26 Nov 2019, 13:53

Originally posted by David Kapaon View Post

However, I still can't seem to run the findname command. I've deleted, and re-downloaded the package you specified, but I get the 'invalid syntax' error whenever I run the command.

That is weird. Can you set up your example data and post the output of

Code:

set trace on findname, vallabeltext(*YES* *NO*) insensitive local(VALUES)

Best
Daniel
Comment
David Kapaon

Join Date: Nov 2019

Posts: 20
#5

26 Nov 2019, 14:16

Sorry just wanted to check before I made a fool of myself, Stata spit out a TON of output - It's actually too big to include in a single post...It looks like this:

- version 9
- syntax [varlist] [if] [in] [, INSEnsitive LOCal(str) NOT PLACEholder(str) Alpha Detail INDENT(int 0) Skip(int 2) Varwidth(int 12) Type(str) all(str
> asis) any(str asis) Format(str) VARLabel VARLabeltext(str asis) VALLabel VALLabelname(str) VALLABELText(str asis) Char Charname(str) CHARText(str asis
> ) ]
- quietly if `"`if'`in'"' != "" {
= quietly if `""' != "" {
marksample touse, novarlist
count if `touse'
if r(N) == 0 error 2000
local if if `touse'
local andif & `touse'
}

Did you want me to post the full version of this?
If so, I'll try putting it in two separate posts.
I used the same 8 variables in the dataex command in the first post
David
Comment
daniel klein

Join Date: Mar 2014

Posts: 3805
#6

26 Nov 2019, 15:19

Yah could be a lot of output, sorry. Try scrolling through the thing and locate the part where the "invalid syntax" error pops up. Post a couple of lines above and below that.

Best
Daniel
Comment
David Kapaon

Join Date: Nov 2019

Posts: 20
#7

26 Nov 2019, 16:08

No worries! So as a test, I shut down Stata (and my computer), turned it back on, and reloaded and checked for any Stata updates and interestingly the findname command now runs on my small 8 variable dataset above, but still does not run on my entire ~5000 variable dataset. The output below shows the 'invalid syntax' error when I run it on the big dataset.
Here's part of the output:

- foreach l of local levels {
- local txt : label `lbl' `l', strict
= local txt : label _vl187 .d, strict
- mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
= mata : find_match(`""', `"*YES* *NO*"', 1, "__000000")
- if `found' {
= if __000000 {
local vlist `vlist' `v'
continue, break
}
- }
- local txt : label `lbl' `l', strict
= local txt : label _vl187 .r, strict
- mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
= mata : find_match(`""', `"*YES* *NO*"', 1, "__000000")
- if `found' {
= if __000000 {
local vlist `vlist' `v'
continue, break
}
- }
- local txt : label `lbl' `l', strict
= local txt : label _vl187 1, strict
- mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
= mata : find_match(`"1 FLHC002[1]"', `"*YES* *NO*"', 1, "__000000")
- if `found' {
= if __000000 {
local vlist `vlist' `v'
continue, break
}
- }
- local txt : label `lbl' `l', strict
= local txt : label _vl187 1-2, strict
invalid syntax
mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
if `found' {
local vlist `vlist' `v'
continue, break
}
}
}
}
local varlist `vlist'
}
--------------------------------------------------------------------------------------------------------------------------------------- end findname ---

David
Comment
daniel klein

Join Date: Mar 2014

Posts: 3805
#8

27 Nov 2019, 04:54

There seems to be something wrong with one of the variables that have value label _vl87 attached. One of the levels (i.e., values) in that variable appears to be 1-2, which is not possible for a numeric variable. However, string variables cannot have labels attached in Stata; but we have seen this happen when data is imported from another source.

Could you run the following and post the output (if feasible)

Code:

foreach var of varlist * { local valuelabel : value label `var' if ("`valuelabel'" != "_v187") }

Best
Daniel
Comment

Announcement

Checking and Relabeling Value Labels Across Separate Waves of Longitudinal Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment