Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Decode for numeric variables

    Hello,

    I am running a loop to remove all value labels and leave them as string variables.
    Code:
                ds, has(vallabel)
                    foreach v of varlist `r(varlist)'{
                        decode `v', gen(temp)
                        replace temp = trim(itrim(lower(temp)))
                        order temp, after(`v')
                        rename `v' `v'_old
                        rename temp `v'
                        drop `v'_old
                    }
                        label drop _all
    However, this is causing problems as some variables that are being "cleaned" are numerical variables, where -1 is the only value encoded as missing. Is there a better way to perform this cleaning? In other words, for some variables, they are numeric values (like weight) and this operation causes them to be missing observations.

  • #2
    I don't understand what the problem is. If -1 is a code for missing value, then it seems that coming up with a missing string value is what you would want. What am I missing here?

    Comment


    • #3
      Hello Clyde,

      The data looks like this
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(weight fruit)
        1 1
        5 2
      325 3
        2 4
      231 5
       -1 6
      end
      label values weight test
      label def test -1 "missing", modify
      label values fruit test2
      label def test2 1 "apple", modify
      label def test2 2 "banana", modify
      label def test2 3 "orange", modify
      label def test2 4 "melon", modify
      label def test2 5 "pear", modify
      label def test2 6 "kiwi", modify
      If I run the given loop, I get this

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str7 weight str6 fruit
      ""        "apple"
      ""        "banana"
      ""        "orange"
      ""        "melon"
      ""        "pear"  
      "missing" "kiwi"      
      end
      The numerical values for weight are deleted and I am left with only strings. I'd like to keep those numerical values.

      Comment


      • #4
        I see. In fact, the problem is more serious than that: any value-labeled variable that contains a numeric value that is not handled in the value label, will generate a missing value for that value when you apply -decode-. So, perhaps you want to "patch" the labels before you -decode-:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(weight fruit)
          1 1
          5 2
        325 3
          2 4
        231 5
         -1 6
        end
        label values weight test
        label def test -1 "missing", modify
        label values fruit test2
        label def test2 1 "apple", modify
        label def test2 2 "banana", modify
        label def test2 3 "orange", modify
        label def test2 4 "melon", modify
        label def test2 5 "pear", modify
        label def test2 6 "kiwi", modify
        
        ds, has(vallabel)
        foreach v of varlist `r(varlist)'{
            // CHECK THAT ALL VALUES OF `v' HAVE A LABEL
            local this_label: value label `v'
            levelsof `v', local(levels)
            foreach l of local levels {
                // IF THERE IS AN UNLABELED VALUE, ADD IT TO THE
                // LABEL AS ITS OWN VALUE
                if `"`:label `this_label' `l''"' == "`l'" {
                    label define `this_label' `l' `"`l'"', modify
                }
            }
            decode `v', gen(temp)
            replace temp = trim(itrim(lower(temp)))
            order temp, after(`v')
            rename `v' `v'_old
            rename temp `v'
            drop `v'_old
        }
        label drop _all
        des
        list
        The tricky part is that the macro extended function :label returns the unlabeled numeric value, rather than "", when it is asked for the label assigned to an unlabeled numeric value. But once that is taken into account, patching the label is straightforward.



        Comment


        • #5
          Alternatively, use Roger Newson's sdecode program from SSC (ssc install sdecode). [Note that the most current version requires Stata 13. If you are using an older version of Stata, use findit sdecode and find a download appropriate for your Stata version.]

          It will do what you want:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(weight fruit)
            1 1
            5 2
          325 3
            2 4
          231 5
           -1 6
          end
          label values weight test
          label def test -1 "missing", modify
          label values fruit test2
          label def test2 1 "apple", modify
          label def test2 2 "banana", modify
          label def test2 3 "orange", modify
          label def test2 4 "melon", modify
          label def test2 5 "pear", modify
          label def test2 6 "kiwi", modify
          
          
                      ds, has(vallabel)
                          foreach v of varlist `r(varlist)'{
                              sdecode `v', gen(temp)
                              replace temp = trim(itrim(lower(temp)))
                              order temp, after(`v')
                              rename `v' `v'_old
                              rename temp `v'
                              drop `v'_old
                          }
                              label drop _all
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Thank you for your help. I'm getting the error may not label .01. I suspect Stata does not handle decimal labels well, which my data set contains a lot of. Is there some way to differentiate between value labels for an actual numeric variable and one for a string variable (weight vs. fruit), so that the loop can only be run on the string variables and numeric will remain when labels are removed?

            Edit: I responded before I saw Carole's response and will check back shortly if that solves the problem I am having. Thanks again.

            Edit 2: sdecode seems to have worked great! Thank you both.
            Last edited by Andrew Castro; 25 Apr 2016, 19:42.

            Comment


            • #7
              Maybe I'm missing something but if your labelled variables decodes to an empty string, all you need is to convert the numeric values to string for those cases. Something like:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(weight fruit)
                1 1
                5 2
              325 3
                2 4
              231 5
               -1 6
              end
              label values weight test
              label def test -1 "missing", modify
              label values fruit test2
              label def test2 1 "apple", modify
              label def test2 2 "banana", modify
              label def test2 3 "orange", modify
              label def test2 4 "melon", modify
              label def test2 5 "pear", modify
              label def test2 6 "kiwi", modify
              
              ds, has(vallabel)
              foreach v of varlist `r(varlist)'{
                  rename `v' `v'_old
                  decode `v'_old, gen(`v')
                  replace `v' = string(`v'_old) if mi(`v')
              }

              Comment

              Working...
              X