Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help: creating dummy variables with tab where the new var name is the string value being identified

    Hi All,

    I am working on a project and need to create a significant number of dummy variables for strings, i.e. type_of_contract that has values of "cost plus" fixed fee" "labor hours" and "time and materials". I know how to create a set of dummy variables using tabulate but I would like to rename them to each type of contract rather than have type_of_contract1-4. Is there any way of baking this into the tab command or am I going to have to rename each by hand? Thanks for your help.

    -Enrique

  • #2
    Well, bear in mind that not every string is a legal variable name, so what you ask may not actually be possible. But you can generate new variables whose names are very close to the original strings, but cleaned up to be legal variable names:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 my_string_variable
    "abc" 
    "def" 
    "gh/i"
    "jk.l"
    end
    
    gen my_variable_name = strtoname(my_string_variable)
    
    levelsof my_variable_name, local(names)
    foreach n of local names {
        gen byte `n' = (my_variable_name == "`n'")
    }
    That said, why do you want to do this? In modern Stata there is very seldom any real need to create dummy variables for anything. Dummy variables are most often used in regression commands, and in regression commands we have factor-variable notation that eliminates the need to create dummy variables. Now, factor-variable notation doesn't accept string variables, but, -encode- enables you to create a labeled numeric variable that will work with factor-variable notation, and then the regression command will show the labels in the output. So consider

    Code:
    encode my_string_variable, gen(my_numeric_variable)
    regression_command outcome_variable i.my_numeric_variable other_variables

    Comment


    • #3
      Clyde gives excellent advice. Factor variable notation makes this kind of exercise less necessary than was true when tabulate was first written. Nevertheless also check out dummieslab (SSC).

      Comment

      Working...
      X