Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I split up every character of a long string?

    I have a dataset of long strings where different positions represent different variables. Thus, I want to separate each character of the string out into separate variables.

    A trimmed-down version of the data looks like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str39 A
    "350AK001046 921  01100210              "
    end

    I tried to split it up using the following code:
    Code:
    split A, p("A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" " ")
    , but this just generated a lot of blank variables.

    Any help here would be much appreciated!

    Note: I ultimately want to have each blank space of the string as a a separate variable too.

  • #2
    There is probably some very cool command to do that. I'd just use substr here:

    Code:
    clear
    input str39 A
    "350AK001046 921  01100210              "
    end
    
    gen len_A = length(A)
    quietly sum len_A
    scalar max_len = r(sum)
    
    forvalues x = 1/`=max_len'{
        gen varname`x' = substr(A, `x', 1)
    }

    Comment


    • #3
      Thanks!

      Comment


      • #4
        FWIW, I thought long and hard about including this in split when I first wrote it (after writing something similar with Michael Blasnik) but decided that it was inconsistent with a main idea that split is based on separators and that the syntax and documentation would just be made more complicated. StataCorp took split over as an official command and haven't changed this aspect.

        I don't know a method other than that of Ken Chui. In principle it could be bundled into a command, which may have been done.

        Comment


        • #5
          I can't resist a challenge (more fun than the paper I'm editing).
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str39 A
          "350AK001046 921  01100210              "
          end
          generate B = ustrregexra(A,"(.)","$1!")
          split B, parse("!")
          list A B, noobs
          list B1-B39, noobs
          Code:
          . generate B = ustrregexra(A,"(.)","$1!")
          
          . split B, parse("!")
          variables created as string: 
          B1   B4   B7   B10  B13  B16  B19  B22  B25  B28  B31  B34  B37
          B2   B5   B8   B11  B14  B17  B20  B23  B26  B29  B32  B35  B38
          B3   B6   B9   B12  B15  B18  B21  B24  B27  B30  B33  B36  B39
          
          . list A B, noobs
          
            +--------------------------------------------------------------------------------+
            |                                                          A                     |
            |                    350AK001046 921  01100210                                   |
            |--------------------------------------------------------------------------------|
            |                                                                              B |
            | 3!5!0!A!K!0!0!1!0!4!6! !9!2!1! ! !0!1!1!0!0!2!1!0! ! ! ! ! ! ! ! ! ! ! ! ! ! ! |
            +--------------------------------------------------------------------------------+
          
          . list B1-B39, noobs
          
            +--------------------------------------------------------------------------------------------+
            | B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 | B11 | B12 | B13 | B14 | B15 | B16 | B17 |
            |  3 |  5 |  0 |  A |  K |  0 |  0 |  1 |  0 |   4 |   6 |     |   9 |   2 |   1 |     |     |
            |-----------------------------+--------------------------------------------------------------|
            | B18 | B19 | B20 | B21 | B22 | B23 | B24 | B25 | B26 | B27 | B28 | B29 | B30  | B31  | B32  |
            |   0 |   1 |   1 |   0 |   0 |   2 |   1 |   0 |     |     |     |     |      |      |      |
            |------------------------------------------------------------------------------+-------------|
            |    B33     |    B34     |    B35     |    B36     |    B37     |     B38     |     B39     |
            |            |            |            |            |            |             |             |
            +--------------------------------------------------------------------------------------------+

          Comment

          Working...
          X