Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace using regex not working

    I have a dataset with a string variable, rdm_code, that needs to match a certain pattern. In order to verify if the pattern is fine, I created a dummy variable that is called bad_scanning. The bad_scanning is equal to one if the pattern is not matched. However, if I run my code all together, the replaces with the regex does not work. If I run line by line, it works fine. I already tried to add the sleep command, but it does not work as well. Does someone knows how to fix this?



    Code:
    gen bad_scanning = 1
    replace bad_scanning = 0 if regexm(rdm_code, "s([0-9]{2}|[0-9]{3}|[1-4])c[0-9]{3}\-([0-9]{3}|[0-9]{2}T)\-[0-1]")
    replace bad_scanning = 0 if regexm(rdm_code, "s[1-2]c[0-9]{2}\-[0-9]{3}|[0-9]{2}T\-[0-1]")


    here's the correct example, when I run line by line
    Code:
    rdm_code        bad_scanning
    s4c064-136-0       0
    s30c110-114-0      0
    s34c252-11T-1      0
    s2c63-149-1        0
    s2c57-070-0        0
    123456             1



    here's the wrong example, when I run the whole do-file

    Code:
    rdm_code        bad_scanning
    s4c064-136-0       1
    s30c110-114-0      1
    s34c252-11T-1      1
    s2c63-149-1        1
    s2c57-070-0        1
    123456             1


    Thank you!!

  • #2
    try:
    Code:
     gen bad_scanning = 1
    replace bad_scanning = 0 if ustrregexm(rdm_code, "s([0-9]{2}|[0-9]{3}|[1-4])c[0-9]{3}\-([0-9]{3}|[0-9]{2}T)\-[0-1]")
    replace bad_scanning = 0 if ustrregexm(rdm_code, "s[1-2]c[0-9]{2}\-[0-9]{3}|[0-9]{2}T\-[0-1]")
    I guess your do-file contains a version statement somewhere and it's run under version less than 18, which causes regexm to behave differently.

    "If user version is less than 18.0 string functions regexm(s,re), regexr(s1,re,s2), and regexs(n), and their Mata equivalents use the old implementation, which is based on Henry Spencer's NFA algorithm and is nearly identical to the POSIX.2 standard. The new implementation in Stata 18 is
    based on the Boost regular expression library and has more features." see -help version-

    Code:
    . version 17: di regexm("s34c252-11T-1", "s([0-9]{2}|[0-9]{3}|[1-4])c[0-9]{3}\-([0-9]{3}|[0-9]{2}T)\-[0-1]")
    0
    
    . version 18: di regexm("s34c252-11T-1", "s([0-9]{2}|[0-9]{3}|[1-4])c[0-9]{3}\-([0-9]{3}|[0-9]{2}T)\-[0-1]")
    1
    Last edited by Hua Peng (StataCorp); 04 Jul 2024, 10:52.

    Comment


    • #3
      Thank you so much, it worked perfectly.

      Comment

      Working...
      X