Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert a string variable to numeric and customize the label encoding?

    I have a dataset for transportation research. I have imported the data into STATA 17. The dataset contains a variable called age (is of string str5) has five categories <18, 18-29, 30-45, 46-60 and >60. The problem is that it is in string, but I need to convert it so that I can use it for regression. I looked into STATA forum and the majority of posts recommended to use encode( ) to make it numeric. After converting it to numeric what I observed that the label encoding (value labels) shuffled say for 18-29: 1, 30-45: 2, 46-60: 3; <18: 4, >60: 5.

    Can anyone help me with the code (reproducible for any string variable) that will convert the variable to numeric (so that can be used for regression task) and relabel it (<18: 0, 18-29: 1, 30-45: 2, 46-60: 3 and >60: 4).

    Goal: The goal is to convert it to numeric and change the existing value labels to desired custom value labels.

    Thanks in advance.

  • #2
    Unsurprisingly there is no intelligence inside encode to recognise your desired order. As you didn't specify a set of value labels, encode just worked on the sort order of characters in string values and all numeric characters come before < and >.

    Code:
    label def age 1 "<18" 2 "18-29" 3 "30-45" 4  "46-60" 5 ">60" 
    encode age, gen(betterage)  label(age)

    Comment


    • #3
      Thank you, Nick Cox 👏.This is a wonderful solution. With do file editor now I can encode all variables at once.

      Comment

      Working...
      X