Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract rows containing "[]" with regular expressions

    I have a dataset with four variables. One variable is string. Some of the observations within this variable have these characters at the end: [D], [E], [F], and so on. I want to identify rows where these characters occur. An example of my dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float line str180 categories str18 b str1 mainsector
    13 "Cereals and bakery products [D]" ""       "."
    14 "Mean"                            "712"    "."
    15 "SE"                              "15.3"   "."
    16 "RSE"                             "2.15"   "."
    17 "Percent Reporting"               "67.5"   "."
    18 ""                                ""       "."
    19 "Cereals and cereal products [D]" ""       "."
    20 "Mean"                            "214.98" "."
    21 "SE"                              "6.08"   "."
    22 "RSE"                             "2.83"   "."
    23 "Percent Reporting"               "40.66"  "."
    end

    Rows 13 and 19 are two examples of such rows. I have tried these fixes from ChatGPT:
    gen uppercat = regexm(categories, "\\[.*\\]")
    gen uppercat = regexm(categories, "\\[.\\]")

    None of these work. This might be a very simple question, but I cannot find the answer.

  • #2
    Code:
    gen uppercat = ustrregexm(categories, "\[.*\]$")
    Note, I am following your lead in allowing anything at all to appear within the square brackets. But if you want to restrict it to a single uppercase letter, use
    Code:
    gen uppercat = ustrregexm(categories, "\[[A-Z]\]$")
    instead.

    By the way, these codes also will only pick up the [] expressions if they occur at the end of the string, which, in #1, is what you said you want.
    Last edited by Clyde Schechter; 15 Feb 2024, 20:54.

    Comment


    • #3
      Hi Clyde,

      Apologies for the late reply. These work perfectly - and the codes do appear at the end of the expressions (the $ sign?). Thank you!

      Regards,
      Saunok

      Comment


      • #4
        Right, the $ is how you denote end of the expression in regular expressions.

        Comment


        • #5
          Thanks a lot Clyde!

          Regards,
          Saunok

          Comment

          Working...
          X