Split command with multiple separators

RonD McDowell

Join Date: Apr 2015

Posts: 44
#1

Split command with multiple separators

30 Oct 2015, 14:03

I am using the split command and wish to use parse using different separators to create new variables with different prefixes according to the parsing separator. For example, the string might be "a:text1 b:text2: a:text3". I'd like to convert this to three different variables: a1 (containing text1) b1 (containing text2) and a2 (containing text3). Is this possible? My original string could have multiple numbers of the prefixes a and b, all interspersed with each other and differing from observation to observation.
Tags: None
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#2

30 Oct 2015, 14:10

Ron,

I'd recommend you post concerete examples of what your initial and final datsets look like (see the FAQ on this). I at least find your example difficult to understand.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
RonD McDowell

Join Date: Apr 2015

Posts: 44
#3

30 Oct 2015, 14:22

I'm sorry this wasn't clear. See the illustration below. Var1 is the original variable, with a1, b1 and a2 the new variables I would like to create. I was hoping split would be able to handle this.

1 Photo
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35451

31 Oct 2015, 04:10

Ron: You ignored most of Roberto's excellent advice, specifically that also given in http://www.statalist.org/forums/help#stata.

Photo attachments are rarely clear. Showing a data example that can be copied and pasted makes matters easier for those who reply. Here is some technique.

Code:

. clear

. input str27 text

                            text
  1. "a:text1 b:text2 a:text3"
  2. "a:stuff1 a:stuff2 b:stuff3"
  3. end

. split text, p(" ") gen(work)
variables created as string:
work1  work2  work3

. gen id = _n

. reshape long work, i(id)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        2   ->       6
Number of variables                   5   ->       4
j variable (3 values)                     ->   _j
xij variables:
                      work1 work2 work3   ->   work
-----------------------------------------------------------------------------

. list, sepby(id)

     +-------------------------------------------------+
     | id   _j                         text       work |
     |-------------------------------------------------|
  1. |  1    1      a:text1 b:text2 a:text3    a:text1 |
  2. |  1    2      a:text1 b:text2 a:text3    b:text2 |
  3. |  1    3      a:text1 b:text2 a:text3    a:text3 |
     |-------------------------------------------------|
  4. |  2    1   a:stuff1 a:stuff2 b:stuff3   a:stuff1 |
  5. |  2    2   a:stuff1 a:stuff2 b:stuff3   a:stuff2 |
  6. |  2    3   a:stuff1 a:stuff2 b:stuff3   b:stuff3 |
     +-------------------------------------------------+

. split work, p(:)
variables created as string:
work1  work2

. list, sepby(id)  

     +------------------------------------------------------------------+
     | id   _j                         text       work   work1    work2 |
     |------------------------------------------------------------------|
  1. |  1    1      a:text1 b:text2 a:text3    a:text1       a    text1 |
  2. |  1    2      a:text1 b:text2 a:text3    b:text2       b    text2 |
  3. |  1    3      a:text1 b:text2 a:text3    a:text3       a    text3 |
     |------------------------------------------------------------------|
  4. |  2    1   a:stuff1 a:stuff2 b:stuff3   a:stuff1       a   stuff1 |
  5. |  2    2   a:stuff1 a:stuff2 b:stuff3   a:stuff2       a   stuff2 |
  6. |  2    3   a:stuff1 a:stuff2 b:stuff3   b:stuff3       b   stuff3 |
     +------------------------------------------------------------------+

. bysort id work1 (_j) : gen jid = work1 + string(_n)

. drop _j text work work1

. rename work2 work

. reshape wide work , i(id) j(jid) string
(note: j = a1 a2 b1)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        6   ->       2
Number of variables                   3   ->       4
j variable (3 values)               jid   ->   (dropped)
xij variables:
                                   work   ->   worka1 worka2 workb1
-----------------------------------------------------------------------------

. renpfix work

. l

     +-------------------------------+
     | id       a1       a2       b1 |
     |-------------------------------|
  1. |  1    text1    text3    text2 |
  2. |  2   stuff1   stuff2   stuff3 |
     +-------------------------------+

Here is the code in one.

Code:

clear
input str27 text
"a:text1 b:text2 a:text3"
"a:stuff1 a:stuff2 b:stuff3"
end
split text, p(" ") gen(work)
gen id = _n
reshape long work, i(id)
list, sepby(id)
split work, p(:)
list, sepby(id)  
bysort id work1 (_j) : gen jid = work1 + string(_n)
drop _j text work work1
rename work2 work
reshape wide work , i(id) j(jid) string
renpfix work
l

If your real problem has complications you didn't yet explain, please modify the code above to create an example reproducing them. Or show your own code working equivalently.

Last edited by Nick Cox; 31 Oct 2015, 04:18.

Announcement

Split command with multiple separators

Comment

Comment

Comment