Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split command with multiple separators

    I am using the split command and wish to use parse using different separators to create new variables with different prefixes according to the parsing separator. For example, the string might be "a:text1 b:text2: a:text3". I'd like to convert this to three different variables: a1 (containing text1) b1 (containing text2) and a2 (containing text3). Is this possible? My original string could have multiple numbers of the prefixes a and b, all interspersed with each other and differing from observation to observation.

  • #2
    Ron,

    I'd recommend you post concerete examples of what your initial and final datsets look like (see the FAQ on this). I at least find your example difficult to understand.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      I'm sorry this wasn't clear. See the illustration below. Var1 is the original variable, with a1, b1 and a2 the new variables I would like to create. I was hoping split would be able to handle this.

      Comment


      • #4
        Ron: You ignored most of Roberto's excellent advice, specifically that also given in http://www.statalist.org/forums/help#stata.

        Photo attachments are rarely clear. Showing a data example that can be copied and pasted makes matters easier for those who reply. Here is some technique.

        Code:
        . clear
        
        . input str27 text
        
                                    text
          1. "a:text1 b:text2 a:text3"
          2. "a:stuff1 a:stuff2 b:stuff3"
          3. end
        
        . split text, p(" ") gen(work)
        variables created as string:
        work1  work2  work3
        
        . gen id = _n
        
        . reshape long work, i(id)
        (note: j = 1 2 3)
        
        Data                               wide   ->   long
        -----------------------------------------------------------------------------
        Number of obs.                        2   ->       6
        Number of variables                   5   ->       4
        j variable (3 values)                     ->   _j
        xij variables:
                              work1 work2 work3   ->   work
        -----------------------------------------------------------------------------
        
        . list, sepby(id)
        
             +-------------------------------------------------+
             | id   _j                         text       work |
             |-------------------------------------------------|
          1. |  1    1      a:text1 b:text2 a:text3    a:text1 |
          2. |  1    2      a:text1 b:text2 a:text3    b:text2 |
          3. |  1    3      a:text1 b:text2 a:text3    a:text3 |
             |-------------------------------------------------|
          4. |  2    1   a:stuff1 a:stuff2 b:stuff3   a:stuff1 |
          5. |  2    2   a:stuff1 a:stuff2 b:stuff3   a:stuff2 |
          6. |  2    3   a:stuff1 a:stuff2 b:stuff3   b:stuff3 |
             +-------------------------------------------------+
        
        . split work, p(:)
        variables created as string:
        work1  work2
        
        . list, sepby(id)  
        
             +------------------------------------------------------------------+
             | id   _j                         text       work   work1    work2 |
             |------------------------------------------------------------------|
          1. |  1    1      a:text1 b:text2 a:text3    a:text1       a    text1 |
          2. |  1    2      a:text1 b:text2 a:text3    b:text2       b    text2 |
          3. |  1    3      a:text1 b:text2 a:text3    a:text3       a    text3 |
             |------------------------------------------------------------------|
          4. |  2    1   a:stuff1 a:stuff2 b:stuff3   a:stuff1       a   stuff1 |
          5. |  2    2   a:stuff1 a:stuff2 b:stuff3   a:stuff2       a   stuff2 |
          6. |  2    3   a:stuff1 a:stuff2 b:stuff3   b:stuff3       b   stuff3 |
             +------------------------------------------------------------------+
        
        . bysort id work1 (_j) : gen jid = work1 + string(_n)
        
        . drop _j text work work1
        
        . rename work2 work
        
        . reshape wide work , i(id) j(jid) string
        (note: j = a1 a2 b1)
        
        Data                               long   ->   wide
        -----------------------------------------------------------------------------
        Number of obs.                        6   ->       2
        Number of variables                   3   ->       4
        j variable (3 values)               jid   ->   (dropped)
        xij variables:
                                           work   ->   worka1 worka2 workb1
        -----------------------------------------------------------------------------
        
        . renpfix work
        
        . l
        
             +-------------------------------+
             | id       a1       a2       b1 |
             |-------------------------------|
          1. |  1    text1    text3    text2 |
          2. |  2   stuff1   stuff2   stuff3 |
             +-------------------------------+
        Here is the code in one.

        Code:
        clear
        input str27 text
        "a:text1 b:text2 a:text3"
        "a:stuff1 a:stuff2 b:stuff3"
        end
        split text, p(" ") gen(work)
        gen id = _n
        reshape long work, i(id)
        list, sepby(id)
        split work, p(:)
        list, sepby(id)  
        bysort id work1 (_j) : gen jid = work1 + string(_n)
        drop _j text work work1
        rename work2 work
        reshape wide work , i(id) j(jid) string
        renpfix work
        l
        If your real problem has complications you didn't yet explain, please modify the code above to create an example reproducing them. Or show your own code working equivalently.
        Last edited by Nick Cox; 31 Oct 2015, 04:18.

        Comment

        Working...
        X