Dealing with plus/minus signs within strings

Adam Sadi

Join Date: Jul 2022
Posts: 68

Dealing with plus/minus signs within strings

20 Dec 2023, 08:25

Dear Statalist,

Please consider the following dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long(width height x y space) str126 text str27 font_name double font_size long(page obs_number)
 78 22  89  89 1 "xxxx"   "BAAAAA+Arial-BoldMT" 20 1  1
 94 22 174  89 1 "xxxx"   "BAAAAA+Arial-BoldMT" 20 1  2
 27 22 273  89 1 "xxxx"         "BAAAAA+Arial-BoldMT" 20 1  3
 32 22 307  89 1 "XXXXX"        "BAAAAA+Arial-BoldMT" 20 1  4
 25 22 345  89 1 "XXXXX"         "BAAAAA+Arial-BoldMT" 20 1  5
123 22 376  89 0 "XXX" "BAAAAA+Arial-BoldMT" 20 1  6
 44 22 255 123 1 "xxx"       "BAAAAA+Arial-BoldMT" 20 1  7
  6 22 305 123 1 "xxx"          "BAAAAA+Arial-BoldMT" 20 1  8
 22 22 317 123 0 "xx"         "BAAAAA+Arial-BoldMT" 20 1  9
 68 15 196 156 1 "xxxxx"  "CAAAAA+ArialMT"      14 1 10
end

I want to perform data cleaning based on font characteristics. However, there are some plus/minus signs that bother me in the process. If I do :

Code:

local pattern_var_2017 "^[A-Z][A-Z0-9_]*[A-Z0-9]*$"
local font_var_2017 "BAAAAA+Arial-BoldMT"
local size_var_2017 12

levelsof obs_number if font_name == "`font_var_2017'" & font_size == `size_var_2017' & ustrregexm(text, "`pattern_var_2017'") == 1

I end up with the error message "Arial not found" r(111). I tried putting compound quotes everywhere just to try my luck but didn't succeed. Can somebody explain me what did I do wrong? That would help me a lot.

Regards,
Adam

Last edited by Adam Sadi; 20 Dec 2023, 08:30.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

20 Dec 2023, 08:46

I cannot replicate your problem in my setup. The code you show runs without error messages in the example data and produces the correct answer (which is nothing as there are no observations in the data that meet the three conditions.)
Comment
Adam Sadi

Join Date: Jul 2022

Posts: 68
#3

21 Dec 2023, 01:21

Clyde: Strange. I've anonymized randomly what was written in the text variable without really caring about whether there would be matches but even after pasting again my dataex example and my code on my console, I still get the same error message.

Thank you anyways for your help. Does anybody else have a clue? I am using Stata 18, by the way.

Edit : If I replace "levelsof obs_number if" with "keep if", the code now works, so for some reason the problem seems to be related to levelsof, if it can help.

Last edited by Adam Sadi; 21 Dec 2023, 01:36.
Comment

Announcement

Dealing with plus/minus signs within strings

Comment

Comment