Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split string+numeric

    Dear all, How can I split x into x1 and x2 below?
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 x str10 x1 float x2
    "AA1234" "AA"  1234
    "B56"    "B"     56
    "CCC987" "CCC"  987
    end
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    The following code does the job!
    Code:
    generate str x1 = ustrregexra(x,"\d","")
    generate float x2 = real(ustrregexra(x,"\D",""))

    Comment


    • #3
      Dear Budu, Many thanks for the helpful reply. How about the following case.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str18 var1
      "AA1,520,228.3"
      "B391,875.1"   
      "CCC347,574.0" 
      end
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        It is not clear what you expect the results to be in this case. Here's my guess.
        Code:
        . split var1, parse(,) generate(x) destring
        variables born as string: 
        x1  x2  x3
        x1: contains nonnumeric characters; no replace
        x2: all characters numeric; replaced as double
        x3: all characters numeric; replaced as double
        (2 missing values generated)
        
        . describe
        
        Contains data
          obs:             3                          
         vars:             4                          
         size:           120                          
        ------------------------------------------------------------------------------------------------
                      storage   display    value
        variable name   type    format     label      variable label
        ------------------------------------------------------------------------------------------------
        var1            str18   %18s                  
        x1              str6    %9s                   
        x2              double  %10.0g                
        x3              double  %10.0g                
        ------------------------------------------------------------------------------------------------
        Sorted by: 
             Note: Dataset has changed since last saved.
        
        . list, clean noobs
        
                     var1       x1      x2      x3  
            AA1,520,228.3      AA1     520   228.3  
               B391,875.1     B391   875.1       .  
             CCC347,574.0   CCC347     574       .

        Comment


        • #5
          How about this?
          Code:
          generate str x2 = ustrregexra(var1,"[\d\.]","")
          generate double x3 = real(ustrregexra(var1,"[^\d\.]",""))
          split x2, parse(,)
          drop x2 x22
          Code:
          . list
          
               +---------------------------------+
               |          var1          x3   x21 |
               |---------------------------------|
            1. | AA1,520,228.3   1520228.3    AA |
            2. |    B391,875.1    391875.1     B |
            3. |  CCC347,574.0      347574   CCC |
               +---------------------------------+

          Comment


          • #6
            Dear William, My bad. Same question as above. How to split x into x1 and x2?
            Code:
            // 
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str18 x str8 x1 double x2
            "AA1,520,228.3" AA  1520228.3
            "B391,875.1"    B    391875.1   
            "CCC347,574.0"  CCC  347574.0 
            end
            dataex
            Ho-Chuan (River) Huang
            Stata 17.0, MP(4)

            Comment


            • #7

              One solutions is:
              Code:
              gen x1 = ustrregexs(1) if ustrregexm(x, "^(\p{L}+)")
              gen double x2 = real(subinstr(subinstr(x,x1,"",1),",","",.))
              If you know the characters are restricted to (uppercase) ASCII, the following will run much faster
              Code:
              gen x1 = regexs(1) if regexm(x, "^([A-Z]+)")
              gen double x2 = real(subinstr(subinstr(x,x1,"",1),",","",.))
              The numbers used in #6 seems to be the same as in a related thread "Divide the string including chinese into two columns".

              References:
              https://www.regular-expressions.info/unicode.html
              http://userguide.icu-project.org/strings/regexp
              Last edited by Bjarte Aagnes; 23 Apr 2019, 07:30.

              Comment


              • #8
                Dear Bjarte, Thanks for the reply. Yes, they are the same question.
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment

                Working...
                X