Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split a string variable into character and numeric parts

    Hi,
    I have a string variable (profile) in my dataset which includes prefixes of H, W, B and the suffix numbers ranging from 1 to 1000 (H1, H2, H3, ..., H1000, W1, W2, W3, ..., W1000, etc). How can I split the prefix characters and the suffix numbers into two different parts. I have tried the below code which generates the suffix numbers but not the character part.
    Thanks,
    NM

    gen iteration = regexs(0) if regexm(profile, "[0-9]*$")
    destring iteration, replace

  • #2
    If all of your prefixes are just one character long, then there is no need to resort to regular expressions. You can just do:

    Code:
    gen prefix = substr(profile, 1, 1)
    gen numeric_part = substr(profile, 2, .)

    Comment


    • #3
      You can install a package -ssc install strkeep- and use as

      strkeep strvar, gen(numpart) numeric
      strkeep strvar, gen(strpart) alpha

      Regards,
      Rasool Bux

      Comment


      • #4
        Hi Clyde,

        I want to split a string variable st_name into string and numeric. The variable is currently setup in the following manner:

        ABC 01
        ABCDE 02
        AB CDE 03
        ABCDE FGHI 04

        So the last two characters are always numeric, however the length and format of the string characters vary. The closest I came to achieving this was the following code:

        Code:
        strkeep st_name, gen(st_name1) alpha
        However, it removes all spaces including the ones that are necessary in a string variable that describes names. For example: AB CDE 02 became ABCDE, instead of the desired output AB CDE.

        Comment


        • #5
          #4 should yield to

          Code:
          gen wanted1 = substr(st_name, 1, length(st_name) - 2) 
          gen wanted2 = substr(st_name, -2, 2)
          but I had more fun using moss from SSC


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str13 abc01
          "ABCDE 02"     
          "AB CDE 03"    
          "ABCDE FGHI 04"
          end
          
          . moss abc01, match("([0-9]+)") regex
          
          . gen stuff = subinstr(abc01, _match1, "", .)
          
          . l
          
               +--------------------------------------------------------+
               |         abc01   _count   _match1   _pos1         stuff |
               |--------------------------------------------------------|
            1. |      ABCDE 02        1        02       7        ABCDE  |
            2. |     AB CDE 03        1        03       8       AB CDE  |
            3. | ABCDE FGHI 04        1        04      12   ABCDE FGHI  |
               +--------------------------------------------------------+

          Comment

          Working...
          X