Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recode string var lists into separate binary vars (dummy vars)

    I have a string var that lists a number of services separated by comma. I'd like each service to become its own var with 1/0 observations. Here is an example using colors below.

    Note that there are 45 unique values within the string var I'm using. The list is always in alphabetical order in the original format. There are also spaces and/or dashes in some of the service names.

    I've searched online a bit, but I don't think I'm wording the question well enough to find answers out there! Thank you!

    Current data format:
    ID Colors
    1 Bluish Green, Green, Orange, Purple
    2 Green, Purple, Red, Yellow
    3 Blue, Green, Orange, Purple, Red, Yellow
    4 Orange, Purple, Red, Yellow

    Ideal format:
    ID BluishGreen Blue Green Orange Purple Red Yellow
    1 1 0 1 1 1 0 0
    2 0 0 1 0 0 1 1
    3 0 1 1 1 1 1 1
    4 0 0 0 1 1 1 1
    Last edited by Jenna Khan; 14 Oct 2015, 10:44. Reason: changed to add a color with a space in the name for the example

  • #2
    Try this:
    Code:
     foreach s in Blue Green Orange Purple Red Yellow {
         gen `s' = strpos(Colors, "`s'") > 0
    }

    Comment


    • #3
      Hi Christian,
      Thanks for that code. I apologize that I updated my post when I realized I have spaces in my observations. I added a color "Bluish Green". I'm stumped on how to add that to foreach code. I tried adding my observations with spaces in foreach with parentheses, ie "Bluish Green", but I get an error "too many variables specified". Thoughts?
      Thanks!

      Comment


      • #4
        tabulate with the generate() option does this.

        Comment


        • #5
          Thanks for the suggestion Nick. Tabulate with generate() did not quite give the results I'm looking for. Using the example above, I'd like to maintain the actual name of the color for the var, rather than Colors_1, Colors_2, etc. I'd also like, for example, 'Blue' to be '1' if it is listed anywhere in the list of colors for that ID. Tab with gen is instead looking for that exact same list of colors. The code Christian posted would work great if I didn't have spaces in some of the names. Do you have suggestions on how to modify Christian's code above to include a value with a space (ie "Bluish Green"). If I use Christian's code with quotes around "Bluish Green" I get the error "too many variables specified". Thanks!

          Comment


          • #6
            You are correct. I didn't read #1 carefully enough.

            Code:
            foreach s in "Bluish Green" Blue Green Orange Purple Red Yellow {      
                 local S = strtoname("`s'")      
                 gen `S' = strpos(Colors, "`s'") > 0
            }

            Comment


            • #7
              Great! Thank you Nick! That worked!

              Comment

              Working...
              X