split string+numeric

River Huang

Join Date: Mar 2016

Posts: 1903
#1

split string+numeric

22 Apr 2019, 16:35

Dear all, How can I split x into x1 and x2 below?

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str12 x str10 x1 float x2 "AA1234" "AA" 1234 "B56" "B" 56 "CCC987" "CCC" 987 end

Ho-Chuan (River) Huang
Stata 17.0, MP(4)
Tags: None
Budu Gulo

Join Date: Feb 2018

Posts: 238
#2

22 Apr 2019, 16:45

The following code does the job!

Code:

generate str x1 = ustrregexra(x,"\d","") generate float x2 = real(ustrregexra(x,"\D",""))
1 like
Comment
River Huang

Join Date: Mar 2016

Posts: 1903
#3

22 Apr 2019, 16:52

Dear Budu, Many thanks for the helpful reply. How about the following case.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str18 var1 "AA1,520,228.3" "B391,875.1" "CCC347,574.0" end

Ho-Chuan (River) Huang
Stata 17.0, MP(4)
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

22 Apr 2019, 17:03

It is not clear what you expect the results to be in this case. Here's my guess.

Code:

. split var1, parse(,) generate(x) destring
variables born as string: 
x1  x2  x3
x1: contains nonnumeric characters; no replace
x2: all characters numeric; replaced as double
x3: all characters numeric; replaced as double
(2 missing values generated)

. describe

Contains data
  obs:             3                          
 vars:             4                          
 size:           120                          
------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------
var1            str18   %18s                  
x1              str6    %9s                   
x2              double  %10.0g                
x3              double  %10.0g                
------------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

. list, clean noobs

             var1       x1      x2      x3  
    AA1,520,228.3      AA1     520   228.3  
       B391,875.1     B391   875.1       .  
     CCC347,574.0   CCC347     574       .

Comment

Budu Gulo

Join Date: Feb 2018
Posts: 238

22 Apr 2019, 17:15

How about this?

Code:

generate str x2 = ustrregexra(var1,"[\d\.]","")
generate double x3 = real(ustrregexra(var1,"[^\d\.]",""))
split x2, parse(,)
drop x2 x22

Code:

. list

     +---------------------------------+
     |          var1          x3   x21 |
     |---------------------------------|
  1. | AA1,520,228.3   1520228.3    AA |
  2. |    B391,875.1    391875.1     B |
  3. |  CCC347,574.0      347574   CCC |
     +---------------------------------+

Comment

River Huang

Join Date: Mar 2016
Posts: 1903

22 Apr 2019, 17:19

Dear William, My bad. Same question as above. How to split x into x1 and x2?

Code:

// 
* Example generated by -dataex-. To install: ssc install dataex
clear
input str18 x str8 x1 double x2
"AA1,520,228.3" AA  1520228.3
"B391,875.1"    B    391875.1   
"CCC347,574.0"  CCC  347574.0 
end
dataex

Ho-Chuan (River) Huang
Stata 17.0, MP(4)

Comment

Bjarte Aagnes

Join Date: Apr 2014

Posts: 783
#7

23 Apr 2019, 07:27

One solutions is:

Code:

gen x1 = ustrregexs(1) if ustrregexm(x, "^(\p{L}+)") gen double x2 = real(subinstr(subinstr(x,x1,"",1),",","",.))

If you know the characters are restricted to (uppercase) ASCII, the following will run much faster

Code:

gen x1 = regexs(1) if regexm(x, "^([A-Z]+)") gen double x2 = real(subinstr(subinstr(x,x1,"",1),",","",.))

The numbers used in #6 seems to be the same as in a related thread "Divide the string including chinese into two columns".

References:
https://www.regular-expressions.info/unicode.html
http://userguide.icu-project.org/strings/regexp

Last edited by Bjarte Aagnes; 23 Apr 2019, 07:30.
2 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1903
#8

23 Apr 2019, 18:45

Dear Bjarte, Thanks for the reply. Yes, they are the same question.

Ho-Chuan (River) Huang
Stata 17.0, MP(4)
Comment

Announcement

split string+numeric

Comment

Comment

Comment

Comment

Comment

Comment

Comment