Hi all – was wondering if anyone would be able to help with a complicated problem I'm having
I’m working with a longitudinal dataset that contains two waves of data – (aptly named W1 and W2). It contains just over 5000 variables (about 2500 per wave) and also just over 5000 observations total. Since it's longitudinal, most of the same questions were asked in both waves. Variables have the following naming convention: wave number + section abbreviation + question number. So the variable w1fs001 would translate to:
w1 --> Wave 1
fs -- > Food Security
001 --> question #001 within the Food Security section
While the dataset contains different types of variables (string, categorical, ordinal, nominal, dichotomous, etc.), for the purposes of this question, I’m looking at re-labeling some binary variables that are in the “YES/NO” format. Right now, there are some variables whose values are labeled “0 - YES/1 – NO” in W1, but “1 - YES/2 – NO” in W2 (or even vice versa - “1 - YES/2 – NO” in W1, or “0 - YES/1 – NO” in W2). However, regardless of whatever the labeling is in W1, I want to ‘align’ the value labels so they are consistent ACROSS waves (while not necessarily being consistent WITHIN waves). I guess stated another way, for each variable, whatever the “YES/NO” value label is in W1, I want to make sure the value label is the same for that variable’s W2 counterpart.
Two things that complicate this further though are - 1. there are hundreds of different “YES/NO” value labels that were auto-generated/assigned to variables during data collection, and despite these labels being named slightly differently (VL105F, VL54F, etc.), they all apply some type of “YES/NO” value label to variables, and 2. there are some variables that have a “YES/NO” value label assigned to them, but the label is applied to values that are not 0, 1, or 2 (ex. “Do you have a 5th child?” - where even though the answer is a numeric “5”, the label appears as “5 – YES” indicating that the respondent does have a 5th child - see variables w1hc002s5 or w1gt001s6 in the dataex above for similar examples). Despite these being coded oddly, I still need to include them in this value label check since they still are in a "YES/NO" format.
Any insights are appreciated as I am thoroughly stumped!
I’m working with a longitudinal dataset that contains two waves of data – (aptly named W1 and W2). It contains just over 5000 variables (about 2500 per wave) and also just over 5000 observations total. Since it's longitudinal, most of the same questions were asked in both waves. Variables have the following naming convention: wave number + section abbreviation + question number. So the variable w1fs001 would translate to:
w1 --> Wave 1
fs -- > Food Security
001 --> question #001 within the Food Security section
While the dataset contains different types of variables (string, categorical, ordinal, nominal, dichotomous, etc.), for the purposes of this question, I’m looking at re-labeling some binary variables that are in the “YES/NO” format. Right now, there are some variables whose values are labeled “0 - YES/1 – NO” in W1, but “1 - YES/2 – NO” in W2 (or even vice versa - “1 - YES/2 – NO” in W1, or “0 - YES/1 – NO” in W2). However, regardless of whatever the labeling is in W1, I want to ‘align’ the value labels so they are consistent ACROSS waves (while not necessarily being consistent WITHIN waves). I guess stated another way, for each variable, whatever the “YES/NO” value label is in W1, I want to make sure the value label is the same for that variable’s W2 counterpart.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(w1hc001 w1hc002s5 w1hc006 w1gt001s6 w2hc001 w2hc006 w2hc002s5 w2gt001s6) 2 5 1 . 1 2 1 2 1 5 1 6 . . . . 1 5 2 . 1 2 1 2 1 5 2 . . . . . 1 5 2 6 . . . . 1 .r 1 6 . . . . 1 5 2 6 1 1 1 2 1 5 1 6 1 2 1 . 2 5 2 6 . . . . 2 5 2 6 1 2 1 2 1 . 2 6 . . . . 1 5 1 6 . . . . 1 5 1 6 . . . . 1 5 2 6 1 2 1 2 1 5 2 . 1 2 1 2 1 5 1 6 . . . . 1 5 1 . 2 2 2 2 1 5 1 6 1 1 1 2 1 5 2 6 1 2 1 2 1 5 2 . 1 2 2 2 end label values w1hc001 HAALSI_VL54F label def HAALSI_VL54F 1 "1 (YES) Yes", modify label def HAALSI_VL54F 2 "2 (NO) No", modify label values w1hc002s5 spicesoils label def spicesoils 5 "5 (Yes) Yes", modify label values w1hc006 HAALSI_VL105F label def HAALSI_VL105F 1 "1 (YES) Yes", modify label def HAALSI_VL105F 2 "2 (NO) No", modify label values w1gt001s6 oldage label def oldage 6 "6 (YES) Yes", modify label values w2hc001 YN label values w2hc006 YN label values w2hc002s5 YN label values w2gt001s6 YN label def YN 1 "Yes", modify label def YN 2 "No", modify
- First, is there a way to limit the dataset to only variables with the “YES/NO" format?
- Second, is there any way to ‘check’ that two variables are assigned the same value label?
- Third, upon checking the value labels, is there a way to assign whatever the W1 value label is, to its W2 counterpart
- Keep only those variables that have “Yes” or “No” in the value label – this would also keep those ‘oddly’ labeled variables too
- Order the variables “sequentially” alternating by wave - (w1pl001, w2pl001, w1pl002, w2pl002, etc)
- Then, cycle through all of the W1 variables only and put the name of each different value label in order in a local/macro
- Run another loop command that cycles through each different value label checking it against each separate ‘pair’ of variables (w1pl201, w2pl201) applying whatever the W1 value label is, to the W2 variable
Code:
findname, vallabeltext(*YES* *NO*) insensitive local(VALUES) gen valuelist = "" local lcode = 0 foreach var of varlist w1* { local lcode = `lcode' + 1 local valuelist : value label `var' replace valuelist = "`valuelist'" in `lcode' }
Comment