Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking and Relabeling Value Labels Across Separate Waves of Longitudinal Data

    Hi all – was wondering if anyone would be able to help with a complicated problem I'm having

    I’m working with a longitudinal dataset that contains two waves of data – (aptly named W1 and W2). It contains just over 5000 variables (about 2500 per wave) and also just over 5000 observations total. Since it's longitudinal, most of the same questions were asked in both waves. Variables have the following naming convention: wave number + section abbreviation + question number. So the variable w1fs001 would translate to:

    w1 --> Wave 1
    fs -- > Food Security
    001 --> question #001 within the Food Security section

    While the dataset contains different types of variables (string, categorical, ordinal, nominal, dichotomous, etc.), for the purposes of this question, I’m looking at re-labeling some binary variables that are in the “YES/NO” format. Right now, there are some variables whose values are labeled “0 - YES/1 – NO” in W1, but “1 - YES/2 – NO” in W2 (or even vice versa - “1 - YES/2 – NO” in W1, or “0 - YES/1 – NO” in W2). However, regardless of whatever the labeling is in W1, I want to ‘align’ the value labels so they are consistent ACROSS waves (while not necessarily being consistent WITHIN waves). I guess stated another way, for each variable, whatever the “YES/NO” value label is in W1, I want to make sure the value label is the same for that variable’s W2 counterpart.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(w1hc001 w1hc002s5 w1hc006 w1gt001s6 w2hc001 w2hc006 w2hc002s5 w2gt001s6)
    2  5 1 . 1 2 1 2
    1  5 1 6 . . . .
    1  5 2 . 1 2 1 2
    1  5 2 . . . . .
    1  5 2 6 . . . .
    1 .r 1 6 . . . .
    1  5 2 6 1 1 1 2
    1  5 1 6 1 2 1 .
    2  5 2 6 . . . .
    2  5 2 6 1 2 1 2
    1  . 2 6 . . . .
    1  5 1 6 . . . .
    1  5 1 6 . . . .
    1  5 2 6 1 2 1 2
    1  5 2 . 1 2 1 2
    1  5 1 6 . . . .
    1  5 1 . 2 2 2 2
    1  5 1 6 1 1 1 2
    1  5 2 6 1 2 1 2
    1  5 2 . 1 2 2 2
    end
    label values w1hc001 HAALSI_VL54F
    label def HAALSI_VL54F 1 "1 (YES) Yes", modify
    label def HAALSI_VL54F 2 "2 (NO) No", modify
    label values w1hc002s5 spicesoils
    label def spicesoils 5 "5 (Yes) Yes", modify
    label values w1hc006 HAALSI_VL105F
    label def HAALSI_VL105F 1 "1 (YES) Yes", modify
    label def HAALSI_VL105F 2 "2 (NO) No", modify
    label values w1gt001s6 oldage
    label def oldage 6 "6 (YES) Yes", modify
    label values w2hc001 YN
    label values w2hc006 YN
    label values w2hc002s5 YN
    label values w2gt001s6 YN
    label def YN 1 "Yes", modify
    label def YN 2 "No", modify
    Two things that complicate this further though are - 1. there are hundreds of different “YES/NO” value labels that were auto-generated/assigned to variables during data collection, and despite these labels being named slightly differently (VL105F, VL54F, etc.), they all apply some type of “YES/NO” value label to variables, and 2. there are some variables that have a “YES/NO” value label assigned to them, but the label is applied to values that are not 0, 1, or 2 (ex. “Do you have a 5th child?” - where even though the answer is a numeric “5”, the label appears as “5 – YES” indicating that the respondent does have a 5th child - see variables w1hc002s5 or w1gt001s6 in the dataex above for similar examples). Despite these being coded oddly, I still need to include them in this value label check since they still are in a "YES/NO" format.
    1. First, is there a way to limit the dataset to only variables with the “YES/NO" format?
    2. Second, is there any way to ‘check’ that two variables are assigned the same value label?
    3. Third, upon checking the value labels, is there a way to assign whatever the W1 value label is, to its W2 counterpart
    I was envisioning some sort of command that loops through all W1 variables, and then checks the value label against its W2 counterpart but am totally lost on how to go about executing it – (especially using extended macro functions which I’m not great with). My thought process was something like this:
    1. Keep only those variables that have “Yes” or “No” in the value label – this would also keep those ‘oddly’ labeled variables too
    2. Order the variables “sequentially” alternating by wave - (w1pl001, w2pl001, w1pl002, w2pl002, etc)
    3. Then, cycle through all of the W1 variables only and put the name of each different value label in order in a local/macro
    4. Run another loop command that cycles through each different value label checking it against each separate ‘pair’ of variables (w1pl201, w2pl201) applying whatever the W1 value label is, to the W2 variable
    This is all I have so far – however the findname command keeps giving me an “invalid Syntax” error, and I can’t seem to figure out what I am typing incorrectly. I am unsure of how to order each pair of variables alternating by wave, and then check the value labels of each pair.

    Code:
    findname, vallabeltext(*YES* *NO*) insensitive local(VALUES)
     
    gen valuelist = ""
    local lcode = 0
    foreach var of varlist w1* {
    local lcode = `lcode' + 1
    local valuelist : value label `var'
    replace valuelist = "`valuelist'" in `lcode'
    }
    Any insights are appreciated as I am thoroughly stumped!

  • #2
    A lot depends on the details. This seems to work for your example data.

    Code:
    // define one value label for yes/no answers
    label define yesno 1 "Yes" 2 "No"
    
    // find all variables that have yes/no answers
    findname , vallabeltext(*yes* *no*) insensitive
    local varlist `r(varlist)'
    
    tempvar strvar // used repeatedly in -decode- below
    
    local i 0
    foreach var of local varlist {
        // -decode- to string variable
        decode `var' , generate(`strvar')
        
        // make sure there are at most two levels (yes and no)
        tabulate `strvar'
        assert r(r) <= 2
        
        // now standardize yes/no strings
        replace `strvar' = "Yes" if strmatch(strlower(`strvar'), "*yes*")
        replace `strvar' = "No"  if strmatch(strlower(`strvar'),  "*no*")
    
        // backtransform to numeric variable
        // use a temporary variable here in case something goes wrong
        tempvar numvar`++i'
        encode `strvar' , generate(`numvar`i'') label(yesno)
        replace `numvar`i'' = `var' if (`var' > .) // preserve extended missings
        drop `strvar' // we re-use our string variable each iteration
    }
    
    // now all variables are successfully transformed
    // make them permanent
    local i 0
    foreach var of local varlist {
        drop   `var' // lose original variable
        rename `numvar`++i'' `var' // and rename
    }
    
    // done
    
    // optionally lose unused value labels
    labelbook , problems
    label drop `r(notused)'
    
    list
    label list
    I have findname from dm0048_3 SJ 15-2.

    Best
    Daniel
    Last edited by daniel klein; 16 Nov 2019, 08:12.

    Comment


    • #3
      Hi Daniel,
      I apologize for the delay - thanks so much for your reply. Just looking over the code, this seems to be what I was looking for. However, I still can't seem to run the findname command. I've deleted, and re-downloaded the package you specified, but I get the 'invalid syntax' error whenever I run the command. I guess I'll try to look for a workaround for keeping only those variables with YES or NO in the value label until I can figure out the problem.
      Thanks again for your help!
      David

      Comment


      • #4
        Originally posted by David Kapaon View Post
        However, I still can't seem to run the findname command. I've deleted, and re-downloaded the package you specified, but I get the 'invalid syntax' error whenever I run the command.
        That is weird. Can you set up your example data and post the output of

        Code:
        set trace on
        findname, vallabeltext(*YES* *NO*) insensitive local(VALUES)
        Best
        Daniel

        Comment


        • #5
          Sorry just wanted to check before I made a fool of myself, Stata spit out a TON of output - It's actually too big to include in a single post...It looks like this:

          - version 9
          - syntax [varlist] [if] [in] [, INSEnsitive LOCal(str) NOT PLACEholder(str) Alpha Detail INDENT(int 0) Skip(int 2) Varwidth(int 12) Type(str) all(str
          > asis) any(str asis) Format(str) VARLabel VARLabeltext(str asis) VALLabel VALLabelname(str) VALLABELText(str asis) Char Charname(str) CHARText(str asis
          > ) ]
          - quietly if `"`if'`in'"' != "" {
          = quietly if `""' != "" {
          marksample touse, novarlist
          count if `touse'
          if r(N) == 0 error 2000
          local if if `touse'
          local andif & `touse'
          }

          Did you want me to post the full version of this?
          If so, I'll try putting it in two separate posts.
          I used the same 8 variables in the dataex command in the first post
          David

          Comment


          • #6
            Yah could be a lot of output, sorry. Try scrolling through the thing and locate the part where the "invalid syntax" error pops up. Post a couple of lines above and below that.

            Best
            Daniel

            Comment


            • #7
              No worries! So as a test, I shut down Stata (and my computer), turned it back on, and reloaded and checked for any Stata updates and interestingly the findname command now runs on my small 8 variable dataset above, but still does not run on my entire ~5000 variable dataset. The output below shows the 'invalid syntax' error when I run it on the big dataset.
              Here's part of the output:


              - foreach l of local levels {
              - local txt : label `lbl' `l', strict
              = local txt : label _vl187 .d, strict
              - mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
              = mata : find_match(`""', `"*YES* *NO*"', 1, "__000000")
              - if `found' {
              = if __000000 {
              local vlist `vlist' `v'
              continue, break
              }
              - }
              - local txt : label `lbl' `l', strict
              = local txt : label _vl187 .r, strict
              - mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
              = mata : find_match(`""', `"*YES* *NO*"', 1, "__000000")
              - if `found' {
              = if __000000 {
              local vlist `vlist' `v'
              continue, break
              }
              - }
              - local txt : label `lbl' `l', strict
              = local txt : label _vl187 1, strict
              - mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
              = mata : find_match(`"1 FLHC002[1]"', `"*YES* *NO*"', 1, "__000000")
              - if `found' {
              = if __000000 {
              local vlist `vlist' `v'
              continue, break
              }
              - }
              - local txt : label `lbl' `l', strict
              = local txt : label _vl187 1-2, strict
              invalid syntax
              mata : find_match(`"`txt'"', `"`vallabeltext'"', `inse', "`found'")
              if `found' {
              local vlist `vlist' `v'
              continue, break
              }
              }
              }
              }
              local varlist `vlist'
              }
              --------------------------------------------------------------------------------------------------------------------------------------- end findname ---


              David

              Comment


              • #8
                There seems to be something wrong with one of the variables that have value label _vl87 attached. One of the levels (i.e., values) in that variable appears to be 1-2, which is not possible for a numeric variable. However, string variables cannot have labels attached in Stata; but we have seen this happen when data is imported from another source.

                Could you run the following and post the output (if feasible)

                Code:
                foreach var of varlist * {
                    local valuelabel : value label `var'
                    if ("`valuelabel'" != "_v187") 
                }
                Best
                Daniel

                Comment

                Working...
                X