Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with plus/minus signs within strings

    Dear Statalist,

    Please consider the following dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(width height x y space) str126 text str27 font_name double font_size long(page obs_number)
     78 22  89  89 1 "xxxx"   "BAAAAA+Arial-BoldMT" 20 1  1
     94 22 174  89 1 "xxxx"   "BAAAAA+Arial-BoldMT" 20 1  2
     27 22 273  89 1 "xxxx"         "BAAAAA+Arial-BoldMT" 20 1  3
     32 22 307  89 1 "XXXXX"        "BAAAAA+Arial-BoldMT" 20 1  4
     25 22 345  89 1 "XXXXX"         "BAAAAA+Arial-BoldMT" 20 1  5
    123 22 376  89 0 "XXX" "BAAAAA+Arial-BoldMT" 20 1  6
     44 22 255 123 1 "xxx"       "BAAAAA+Arial-BoldMT" 20 1  7
      6 22 305 123 1 "xxx"          "BAAAAA+Arial-BoldMT" 20 1  8
     22 22 317 123 0 "xx"         "BAAAAA+Arial-BoldMT" 20 1  9
     68 15 196 156 1 "xxxxx"  "CAAAAA+ArialMT"      14 1 10
    end
    I want to perform data cleaning based on font characteristics. However, there are some plus/minus signs that bother me in the process. If I do :

    Code:
    local pattern_var_2017 "^[A-Z][A-Z0-9_]*[A-Z0-9]*$"
    local font_var_2017 "BAAAAA+Arial-BoldMT"
    local size_var_2017 12
    
    levelsof obs_number if font_name == "`font_var_2017'" & font_size == `size_var_2017' & ustrregexm(text, "`pattern_var_2017'") == 1
    I end up with the error message "Arial not found" r(111). I tried putting compound quotes everywhere just to try my luck but didn't succeed. Can somebody explain me what did I do wrong? That would help me a lot.

    Regards,
    Adam
    Last edited by Adam Sadi; 20 Dec 2023, 08:30.

  • #2
    I cannot replicate your problem in my setup. The code you show runs without error messages in the example data and produces the correct answer (which is nothing as there are no observations in the data that meet the three conditions.)

    Comment


    • #3
      Clyde: Strange. I've anonymized randomly what was written in the text variable without really caring about whether there would be matches but even after pasting again my dataex example and my code on my console, I still get the same error message.

      Thank you anyways for your help. Does anybody else have a clue? I am using Stata 18, by the way.

      Edit : If I replace "levelsof obs_number if" with "keep if", the code now works, so for some reason the problem seems to be related to levelsof, if it can help.
      Last edited by Adam Sadi; 21 Dec 2023, 01:36.

      Comment

      Working...
      X