Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flag words in a var with all characters uppercase

    I wonder if someone knows a quick and easy way to select words in a string where all the characters are uppercase.
    I want to exclude numbers, symbols, spaces and everything else. I only need to flag words with all upper case.

    Something like this:
    mystring flag
    BMW 1
    BMW9 0
    XS.9 0
    XA 1
    xA 0
    xA. 0


  • #2
    Don't quite get your use of "exclude" here, but I understand that you want to flag any string that contains only upper case alphabetic characters. Strings containing any other character are to be identified as "not all upper."
    Code:
    clear
    input str20 mystring
    "BMW"
    "BMW9 "
    "XS.9"
    "XA1"
    "XA"
    "xA.0"
    end
    //
    gen slength = strlen(mystring)
    quiet summ slength
    local maxlen = r(max)
    gen byte allupper = 1
    gen str next = ""
    forval i = 1/`maxlen' {
      replace next = substr(mystring, `i', 1) if (`i' <= slength)
      replace allupper = 0 if (allupper == 1) & !inrange(next, "A", "Z")
    }

    Comment


    • #3
      To flag words with all upper case regular expressions may be used.

      If you have only ASCII (range) characters:
      Code:
      gen byte flag = regexm(mystring,"^[A-Z]+$")
      If characters outside the ASCII range:
      Code:
      gen byte flag = ustrregexm(mystring,"^[\w]+$") & !ustrregexm(mystring,"[\d_]+") & mystring == upper(mystring)
      The latter might work or need adaption and the Stata reg-ex documentation could be better.

      Comment


      • #4
        Thank you, both! You save my day...

        Comment


        • #5
          PS.The unicode regex in #3 should be replaced with:
          Code:
          gen byte flag = ustrregexm(mystring,"^[\p{Uppercase_Letter}]+$")
          or even shorter
          Code:
          gen byte flag = ustrregexm(mystring,"^[\p{Lu}]+$")
          see http://www.regular-expressions.info/unicode.html

          Comment

          Working...
          X