Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing values of newly generated string variable if old string variable contains certain characters

    Hello,

    I have a string variable called relationship that describes the relationship between 2 people. Each observation is a person. I am trying to generate a new variable that consolidates the various spellings into one spelling. In the dataex example below, I would want to generate a new variable called relationship_cleaned with the 4 values shown below equal to "Acquaintance". The only way I know how to do this is:

    generate relationship_cleaned=.
    replace relationship_cleaned = "Acquaintance" if relationship == " Acquaintance" | relationship=="Acquaintance - former roommate"| ....

    Could someone please tell me of a way to do the same thing but rather than writing out all the different spellings, changing the value of the new variable if the variable relationship starts with the characters "Acquaintance"? Thank you very much for your time and help!

    input str30(relationship)

    relationship
    1. "Acquaintance"
    2. "Acquaintance - former roommate"
    3. "Acquaintance - classmate"
    4. "Acquaintances"
    5. end

  • #2
    Code:
    generate relationship_cleaned = ""
    replace relationship_cleaned = "Acquaintance" if substr(relationship,1,12) == " Acquaintance"

    Comment


    • #3
      (Crossed in the ether with William Lisowski's posting, but I'll post it anyway as I add a little relevant material.)

      What you ask for could be done as:
      Code:
      relationship_cleaned = "Acquaintance" if strpos(lower(relationship), "acquaintance") ==   1
      Comments:
      1. Checking for an exact spelling is a relatively brittle strategy. I only minimally softened that by going to lower case. You might want to check for something simpler like just "Acq".
      2. Stata has a nice and pretty straightforward collection of string functions. See -help string functions- to learn about them.
      3. You'd be better off with a numeric variable for relationship. Besides saving space in the relationship variable ("Acquaintance" takes 12 times as much space as a single byte numeric coding), but more importantly, the string version of that variable won't be very convenient to use in any other syntax, and won't be amenable to inclusion in most statistical procedure commands.

      Comment


      • #4
        Thank you very much, William Lisowski and Mike Lacy! Once I consolidate all the spellings, I will define labels, replace the string values with numeric values, label them, then destring the variable. There is probably a faster way to do this as well

        Comment

        Working...
        X