Keeping consistent values across variables with -encode-

Jacob Levine

Join Date: Aug 2017

Posts: 13
#1

Keeping consistent values across variables with -encode-

22 Feb 2018, 08:47

Hey all again, trying to see again if there's a clever solution to my goal. Given the length of the strings and potentially sensitive information, I'm omitting -dataex- and making a brief example to explain.

I have 3 string variables I'd like to encode. The strings are all taken from the string list, however not every variable has every potential value for the string, leading to a dataset like this:

Code:

clear input str1 (var1 var2 var3) "a" "a" "a" "b" "a" "b" "c" "c" "b" "a" "d" "b" end

Now, I'd like to use -encode- so that a = 1 , b = 2 c = 3 and so on for all variables. However as each variable has a different number of values, the encoding will not be consistent across variables with repeated -encodes-. Is there a solution to ensure consistency? I'm going to experiment with -reshape- to solve this, but I'm curious to see if anyone else has had this issue

Thanks!

Last edited by Jacob Levine; 22 Feb 2018, 09:33.
Tags: None
Jacob Levine

Join Date: Aug 2017

Posts: 13
#2

22 Feb 2018, 08:57

I believe I've written my own solution using -reshape-. Sharing if anyone else finds this issue

Code:

reshape long var, i(id) j(number) encode var, g(newvar) drop var reshape wide newvar, i(id) j(number)

"id" is a unique observation identifier, number is an arbitrary variable created in -reshape-

Hope this helps someone in the future, and if you have a different way, feel free to share!
Comment

daniel klein

Join Date: Mar 2014
Posts: 3820

22 Feb 2018, 09:08

Fine to omit dataex, for the reasons that you give, but please provide examples that work next time you post. You

(1) cannot use string as a variable type with input
(2) cannot omit double quotes around string values with input when strings contain embedded spaces; see #5 below
(3) cannot omit the end statement after input

That said, see multencode (Cox, SSC) to solve the problem

Code:

clear
input str1 (var1 var2 var3)
"a" "a" "a"
"b" "a" "b"
"c" "c" "b"
"a" "d" "b"
end

multencode var1-var3 , generate(enc_var1-enc_var3)

list
label list

gives

Code:

. input str1 (var1 var2 var3)

          var1       var2       var3
  1. "a" "a" "a"
  2. "b" "a" "b"
  3. "c" "c" "b"
  4. "a" "d" "b"
  5. end

.
. multencode var1-var3 , generate(enc_var1-enc_var3)

.
. list

     +-----------------------------------------------------+
     | var1   var2   var3   enc_var1   enc_var2   enc_var3 |
     |-----------------------------------------------------|
  1. |    a      a      a          a          a          a |
  2. |    b      a      b          b          a          b |
  3. |    c      c      b          c          c          b |
  4. |    a      d      b          a          d          b |
     +-----------------------------------------------------+

. label list
var1:
           1 a
           2 b
           3 c
           4 d

Best
Daniel

Last edited by daniel klein; 22 Feb 2018, 09:54.

Comment

Jacob Levine

Join Date: Aug 2017

Posts: 13
#4

22 Feb 2018, 09:34

example edited, thanks for the one command solution
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

22 Feb 2018, 09:40

Albeit reluctantly, I want to point out, in regard to post #5, that double quotes around string values are optional with input for those strings that contain no embedded blanks or "special characters" (which the Stata Data Management Reference Manual PDF apparently does not clarify further in the documentation for input). Note the second example in the output of help input where the three values for name are surrounded in double quotes but two of the values for sex are not, and one is, even though it contains no embedded space.

The best way to prepare made-up data for sharing on Statalist is to use Stata's Data Editor window to create the made-up data in Stata's memory, and then in Stata's Command window apply dataex to create the listing to be copied and pasted into the Statalist post.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3820
#6

22 Feb 2018, 09:55

Originally posted by William Lisowski View Post

I want to point out [... ] that double quotes around string values are optional with input for those strings that contain no embedded blanks or "special characters"

I was not aware of this. Thanks for pointing it out. I have edited my earlier post.

Best
Daniel
Comment

Announcement

Keeping consistent values across variables with -encode-

Comment

Comment

Comment

Comment

Comment