Extract rows containing "[]" with regular expressions

Saunok Chakrabarty

Join Date: Aug 2019
Posts: 43

Extract rows containing "[]" with regular expressions

15 Feb 2024, 20:24

I have a dataset with four variables. One variable is string. Some of the observations within this variable have these characters at the end: [D], [E], [F], and so on. I want to identify rows where these characters occur. An example of my dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float line str180 categories str18 b str1 mainsector
13 "Cereals and bakery products [D]" ""       "."
14 "Mean"                            "712"    "."
15 "SE"                              "15.3"   "."
16 "RSE"                             "2.15"   "."
17 "Percent Reporting"               "67.5"   "."
18 ""                                ""       "."
19 "Cereals and cereal products [D]" ""       "."
20 "Mean"                            "214.98" "."
21 "SE"                              "6.08"   "."
22 "RSE"                             "2.83"   "."
23 "Percent Reporting"               "40.66"  "."
end

Rows 13 and 19 are two examples of such rows. I have tried these fixes from ChatGPT:
gen uppercat = regexm(categories, "\\[.*\\]")
gen uppercat = regexm(categories, "\\[.\\]")

None of these work. This might be a very simple question, but I cannot find the answer.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

15 Feb 2024, 20:52

Code:

gen uppercat = ustrregexm(categories, "\[.*\]$")

Note, I am following your lead in allowing anything at all to appear within the square brackets. But if you want to restrict it to a single uppercase letter, use

Code:

gen uppercat = ustrregexm(categories, "\[[A-Z]\]$")

instead.

By the way, these codes also will only pick up the [] expressions if they occur at the end of the string, which, in #1, is what you said you want.

Last edited by Clyde Schechter; 15 Feb 2024, 20:54.
Comment
Saunok Chakrabarty

Join Date: Aug 2019

Posts: 43
#3

17 Feb 2024, 13:40

Hi Clyde,

Apologies for the late reply. These work perfectly - and the codes do appear at the end of the expressions (the $ sign?). Thank you!

Regards,
Saunok
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#4

17 Feb 2024, 14:49

Right, the $ is how you denote end of the expression in regular expressions.
1 like
Comment
Saunok Chakrabarty

Join Date: Aug 2019

Posts: 43
#5

17 Feb 2024, 18:28

Thanks a lot Clyde!

Regards,
Saunok
Comment

Announcement

Extract rows containing "[]" with regular expressions

Comment

Comment

Comment

Comment