Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create matrix of variable value code + labels

    Hello I am attempting to create a metadata data file that includes a column that lists out all the value codes + label for each variable.
    I am a SAS user primarily and haven't used looping in STATA in quite a while. I have been able to loop over the variables but can't remember how to save the results in a matrix that I can get into a dataset.

    I want to end up with this:
    lname values
    var1 -5: Don't know, -3 Missing, 1: Yes, 2: No
    var2 -5: Don't know, -3 Missing, 1: Often, 2: Sometimes, 3: Never
    Here are the steps that I have gone through so far:

    1. After loading data, 'uselabel' to create a dataset with all the format information.

    lname value label
    var1 -5 -5: Don't know
    var1 -3 -3: Missing
    var1 1 1: Yes
    var1 2 2: No
    var2 -5 -5: Don't know
    var2 -3 -3: Missing
    var2 1 1: Often
    var2 2 2: Sometimes
    var2 3 3: Never
    2. read through this post: https://www.stata.com/meeting/oceani...a17_Vidmar.pdf and ran the following code

    gen recnum=_n
    levelsof lname, local(levels)

    foreach x of local levels {
    local fullab
    qui su recnum if lname=="`x'"
    local j=r(min)
    local k=r(max)
    forval i=`j'/`k' {
    local val=value[`i']
    local lab=label[`i']
    local fullab `fullab' `val', `lab' |
    }
    local lenlab=strlen("`fullab'")-2
    local fullab=substr("`fullab'",1,`lenlab')
    }

    3. looked at the contents of the fullab macro variable

    di "`fullab'"
    -5, -5: Don't know | -3, -3: Missing | 1, 1: Often | 2, 2: Sometimes | 3, 3: Never

    4. Looks like it iterated and contains something close enough to what I need. How do I capture the information from each loop in a matrix and save it to a dataset?

    Thanks for any help!


  • #2
    The better place to post this question would have been the General forum.

    I think you are making this more complicated than it is. uselabel already creates a dataset with all the information you want; why create locals and a matrix to then create another dataset?

    Try this:

    Code:
    // example dataset
    clear
    sysuse nlsw88
    
    // create dataset of value labels
    uselabel , clear
    
    // combine values and labels into one (string) variable
    generate value_label = string(value) + ": " + label
    
    // keep the relevant variables
    keep lname value_label
    
    // add an auxiliary variable for -reshape-
    bysort lname : generate j = _n
    
    // maxium number of integer-to-text mappings
    summarize j , meanonly
    local max = r(max)
    
    // -reshape-
    reshape wide value_label , i(lname) j(j)
    
    // combine into one variable
    generate value_label = value_label1
    forvalues i = 2/`max' {
        replace value_label = value_label ///
            + cond(value_label`i' == "", "", ", " + value_label`i')
    }
    
    // reduce to relevant variables
    keep lname value_label

    Because you typically use SAS, and because you choose var1, var2, and so on as levels/values of lname, you might be under wrong impression that uselabel saves variable names. It does not. It saves value label names. In Stata, multiple variables can share the same value label. Please read [U] 12.6.3 Value labels to make sure you understand the concept.

    Comment


    • #3
      Here is an extended code that might be helpful. It adds the value label information from #2 to the dataset that describe creates.

      Code:
      // example dataset
      clear
      sysuse nlsw88
      
      // create dataset of value labels
      uselabel , clear
      
      // combine values and labels into one (string) variable
      generate value_label = string(value) + ": " + label
      
      // keep the relevant variables
      keep lname value_label
      
      // add an auxiliary variable for -reshape-
      bysort lname : generate j = _n
      
      // maxium number of integer-to-text mappings
      summarize j , meanonly
      local max = r(max)
      
      // -reshape-
      reshape wide value_label , i(lname) j(j)
      
      // combine into one variable
      generate value_label = value_label1
      forvalues i = 2/`max' {
          replace value_label = value_label ///
              + cond(value_label`i' == "", "", ", " + value_label`i')
      }
      
      // reduce to relevant variables
      keep lname value_label
      
      // rename for -merge-
      rename lname vallab
      
      // save as temporary file
      tempfile tmp
      save "`tmp'"
      
      // load the original dataset
      sysuse nlsw88 , clear
      
      // get metadata
      describe , replace
      
      // add value label mappings
      merge m:1 vallab using "`tmp'" , nogenerate
      
      // cosmetics
      label variable value_label "Value label mappings/codings"


      Edit:

      Value labels can hold many integer-to-text mappings; the resulting string variables can become quite long. Personally, I might prefer a format closer to uselabel:

      Code:
      clear
      sysuse nlsw88
      
      // create dataset of value labels
      uselabel , clear
      
      // keep the relevant variables
      keep lname value label
      
      // rename for before combining
      rename lname vallab
      
      // save as temporary file
      tempfile tmp
      save "`tmp'"
      
      // load the original dataset
      sysuse nlsw88 , clear
      
      // get metadata
      describe , replace
      
      // add value label mappings
      joinby vallab using "`tmp'"
      Last edited by daniel klein; 03 Oct 2022, 02:42.

      Comment

      Working...
      X