Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting substrings from a string variable with a hierarchy rule

    I am trying to write a code with Stata to extract a single section of a string variable based on a hierarchy rule.
    For example, let's define the hierarchy in order of decreasing importance, the following colours: red, green, blue

    If I have the following entries under variable name "colour":

    red
    red; green
    green; blue
    blue; red
    blue

    I would like the output variable to be the following:

    red
    red
    green
    red
    blue


    Many thanks in advance for your help

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str11 wehave
    "red"        
    "red; green"
    "green; blue"
    "blue; red"  
    "blue"      
    end
    
    gen wanted = cond(strpos(wehave, "red"), "red", cond(strpos(wehave, "green"), "green", cond(strpos(wehave, "blue"), "blue", "")))
    
    list
    
         +----------------------+
         |      wehave   wanted |
         |----------------------|
      1. |         red      red |
      2. |  red; green      red |
      3. | green; blue    green |
      4. |   blue; red      red |
      5. |        blue     blue |
         +----------------------+
    
    .

    For your real problem, this may work better:


    Code:
    gen WANTED = ""
    
    foreach c in red green blue {
        replace WANTED = "`c'" if strpos(wehave, "`c'") & WANTED == ""
    }
    Last edited by Nick Cox; 22 Mar 2022, 07:54.

    Comment


    • #3
      Nick Cox super elegant solution, I used the second one and it works perfectly thanks! Quick question, (as my problem does not have only single words such as green and red): sometimes my data has a combination of 2+ words with a space e.g. "hello bye"- how can I tell it that I want it to consider "hello bye" as a single unit rather than as two separate options?

      Comment


      • #4
        Please show a more realistic data example.

        Comment

        Working...
        X