Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tokens: different behaviour depending on the parsechar specified: bug? feature?

    Hello,
    Does anyone knows why tokens("0.1.0", ".") or tokens("0.1.0", char(46)) produces 5 cells rowvector whereas tokens("0 1 0", " ") produces only 3 cells?

    This is what I get in Stata 14.2:

    . mata: tokens("0.1.0", ".")
    1 2 3 4 5
    +---------------------+
    1 | 0 . 1 . 0 |
    +---------------------+

    . mata: tokens("0 1 0", " ")
    1 2 3
    +-------------+
    1 | 0 1 0 |
    +-------------+

  • #2
    It looks like unexpected behavior to me, i.e., a bug.

    Comment


    • #3
      I agree that this behavior of tokens() is unexpected, and help mf_tokens does not suggest this is intended. However, I'm guessing that this is built on top of tokenget() and help mf_tokenget has much to say. tokenget() distinguishes between "parsing characters" and "white-space characters", and the behavior we anticipated from tokens("0.1.0", ".") would appear to require treating the second argument as the "white space character" rather than a "parsing character". It should in theory be possible to use tokeninit(), tokenset(), and tokenget() to obtain the desired resiults, but that's well beyond my competence to implement.

      Consider also that the behavior is consistent with the tokenize command.
      Code:
      . tokenize "0.1.0", parse(".")
      
      . di "1=|`1'|, 2=|`2'|, 3=|`3'|, 4=|`4'|, 5=|`5'|, 6=|`6'|"
      1=|0|, 2=|.|, 3=|1|, 4=|.|, 5=|0|, 6=||

      Comment


      • #4
        Behavior that is not documented and thus unexpected definitely qualifies as a bug. StataCorp should take note!

        Comment

        Working...
        X