Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find character position of first ALPHA occurrence

    This is such a simple question, but I cannot get my code to work. As the post title says, I want to find the character position of first ALPHA occurrence.

    If I type
    Code:
    display strpos("1847CANAL_6_N001", "C")
    It correctly outputs 5.

    However, for any alpha character, I am trying
    Code:
    display strpos("1847CANAL_6_N001", "([a-zA-Z])")
    And it returns 0. *Note I have tried a variety of w/ and without ( ) ^ +. etc.

    Sincerely frustrated, Laura

  • #2
    -strpos()- is not a regular expression string function, so you cannot use regular expression syntax. I can think of a number of ways to get what you want. Keeping with regular expressions, you can look at the string pattern, specify it and use the -length()- function to identify the position of the first non digit character.

    Code:
     di length(ustrregexra("1847CANAL_6_N001", "([\d+][^\d])(\w+)", "$1"))
    Res.:

    Code:
    . di length(ustrregexra("1847CANAL_6_N001", "([\d+][^\d])(\w+)", "$1"))
    5
    Last edited by Andrew Musau; 22 Mar 2021, 13:36.

    Comment


    • #3
      The second argument of strpos() is always taken to be a literal string or the name of a string scalar or string variable.


      This example uses
      moss from SSC. I am imagining that although you gave us a nice simple specific example, your underlying problem is about processing string variables.


      Code:
      . clear
      
      . set obs 1
      number of observations (_N) was 0, now 1
      
      . gen test = "1847CANAL_6_N001"
      
      . moss test, match("([A-Za-z])") regex max(1)
      
      . l
      
           +---------------------------------------------+
           |             test   _count   _match1   _pos1 |
           |---------------------------------------------|
        1. | 1847CANAL_6_N001        1         C       5 |
           +---------------------------------------------+
      Code:
      
      

      Comment


      • #4
        Thanks Nick! Indeed, I want to find the first instance of alpha, then grab everything from that position in the string, until the end.

        Which works perfectly now, with your -moss- line of code included!

        Code:
        clear
        set obs 4
        gen text ="-3.6133AVENAL_6_GN002" in 1
        replace text = "-3.6133AVENAL_6_GN003" in 2
        replace text = "-30.87814EAGLEMTN_2_N002" in 3
        replace text = "-61.2921DEVERS_1_N081" in 4
        moss text, match("([A-Za-z])") regex max(1)
        generate str1 node = ""
        replace node = usubstr(text,_pos1,.)
        Last edited by Laura Grant; 22 Mar 2021, 15:36.

        Comment


        • #5
          A regex solution to your example is the following:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str24 text
          "-3.6133AVENAL_6_GN002"  
          "-3.6133AVENAL_6_GN003"  
          "-30.87814EAGLEMTN_2_N002"
          "-61.2921DEVERS_1_N081"  
          end
          
          gen wanted= ustrregexra(text, "([^a-zA-Z]+)([a-zA-Z]{1}.*$)", "$2")
          The syntax takes getting used to, but it is not as difficult as it looks.

          Code:
          . gen wanted= ustrregexra(text, "([^a-zA-Z]+)([a-zA-Z]{1}.*$)", "$2")
          
          . l
          
               +--------------------------------------------+
               |                     text           wanted  |
               |--------------------------------------------|
            1. |    -3.6133AVENAL_6_GN002    AVENAL_6_GN002 |
            2. |    -3.6133AVENAL_6_GN003    AVENAL_6_GN003 |
            3. | -30.87814EAGLEMTN_2_N002   EAGLEMTN_2_N002 |
            4. |    -61.2921DEVERS_1_N081     DEVERS_1_N081 |
               +--------------------------------------------+

          Comment

          Working...
          X