Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split after different string

    Hello,
    I would like to slip a string variable right after a certain word appear, but the word is not the same for each observation. This words is recorded in the variable "parcing". The variable I want should look like this. "hola adios"
    "ciao adios"
    "yes bye ciao adios hola"
    "Thank you

    Clear
    input strL content str12 parcing
    "hello yes bye ciao hola adios" "ciao"
    "hello hola yes ehy bye ciao adios" "bye"
    "hello yes bye ciao adios hola" "hello"
    end

  • #2
    I'm not sure what you're after, but you'd likely use subinstr with strpos to mark the location of the word, adding digits the length of the word you want to drop it after.

    Comment


    • #3
      Thank you George, I thought about that but I was hoping there is a way to write only one line of code like "split content, p (parsing)", which I know doesn't exist, but you know, hope springs eternal

      Comment


      • #4
        This is just more or less what split does, look for the parsing string and pick what comes before and what comes after.

        I added an example where the parsing string is at the end and one where it doesn't occur. You might want different rules for that last case.

        Also consider whether you want to trim any nleading and trailing spaces.

        Code:
        clear
        input strL content str12 parsing
        "hello yes bye ciao hola adios" "ciao"
        "hello hola yes ehy bye ciao adios" "bye"
        "hello yes bye ciao adios hola" "hello"
        "newt toad frog" "frog"
        "dinosaur newt toad" "frog"
        end
        
        gen where = strpos(content, parsing)
        gen before = substr(content, 1, cond(where == 0, ., where - 1))
        gen after = substr(content, where + strlen(parsing), .) if where 
        
        list, sep(0)
        
             +------------------------------------------------------------------------------------------------------+
             |                           content   parsing   where                before                      after |
             |------------------------------------------------------------------------------------------------------|
          1. |     hello yes bye ciao hola adios      ciao      15        hello yes bye                  hola adios |
          2. | hello hola yes ehy bye ciao adios       bye      20   hello hola yes ehy                  ciao adios |
          3. |     hello yes bye ciao adios hola     hello       1                          yes bye ciao adios hola |
          4. |                    newt toad frog      frog      11            newt toad                             |
          5. |                dinosaur newt toad      frog       0    dinosaur newt toad                            |
             +------------------------------------------------------------------------------------------------------+

        Comment


        • #5
          Two additional notes:

          1. The code takes the problem in #1 literally and splits on the first occurrence of the parsing string. Unlike with split the result will not be three or more variables if the parsing string occurs twice or more.

          2. split was added in Stata 8. As documented in the manual entry it builds on my work and also on earlier work jointly with Michael Blasnik. The idea that the parsing string might differ and need to be recorded in a variable didn't arise at the time, or since that I can recall. Be that as it may, a full generalization of split might build on the existing code, but I am not volunteering.

          Comment

          Working...
          X