Find character position of first ALPHA occurrence

Laura Grant

Join Date: Mar 2020

Posts: 6
#1

Find character position of first ALPHA occurrence

22 Mar 2021, 12:55

This is such a simple question, but I cannot get my code to work. As the post title says, I want to find the character position of first ALPHA occurrence.

If I type

Code:

display strpos("1847CANAL_6_N001", "C")

It correctly outputs 5.

However, for any alpha character, I am trying

Code:

display strpos("1847CANAL_6_N001", "([a-zA-Z])")

And it returns 0. *Note I have tried a variety of w/ and without ( ) ^ +. etc.

Sincerely frustrated, Laura
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10294
#2

22 Mar 2021, 13:32

-strpos()- is not a regular expression string function, so you cannot use regular expression syntax. I can think of a number of ways to get what you want. Keeping with regular expressions, you can look at the string pattern, specify it and use the -length()- function to identify the position of the first non digit character.

Code:

di length(ustrregexra("1847CANAL_6_N001", "([\d+][^\d])(\w+)", "$1"))

Res.:

Code:

. di length(ustrregexra("1847CANAL_6_N001", "([\d+][^\d])(\w+)", "$1")) 5

Last edited by Andrew Musau; 22 Mar 2021, 13:36.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35804

22 Mar 2021, 13:39

The second argument of strpos() is always taken to be a literal string or the name of a string scalar or string variable.

This example uses moss from SSC. I am imagining that although you gave us a nice simple specific example, your underlying problem is about processing string variables.

Code:

. clear

. set obs 1
number of observations (_N) was 0, now 1

. gen test = "1847CANAL_6_N001"

. moss test, match("([A-Za-z])") regex max(1)

. l

     +---------------------------------------------+
     |             test   _count   _match1   _pos1 |
     |---------------------------------------------|
  1. | 1847CANAL_6_N001        1         C       5 |
     +---------------------------------------------+

Code:

Comment

Laura Grant

Join Date: Mar 2020

Posts: 6
#4

22 Mar 2021, 15:20

Thanks Nick! Indeed, I want to find the first instance of alpha, then grab everything from that position in the string, until the end.

Which works perfectly now, with your -moss- line of code included!

Code:

clear set obs 4 gen text ="-3.6133AVENAL_6_GN002" in 1 replace text = "-3.6133AVENAL_6_GN003" in 2 replace text = "-30.87814EAGLEMTN_2_N002" in 3 replace text = "-61.2921DEVERS_1_N081" in 4 moss text, match("([A-Za-z])") regex max(1) generate str1 node = "" replace node = usubstr(text,_pos1,.)

Last edited by Laura Grant; 22 Mar 2021, 15:36.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10294

22 Mar 2021, 16:17

A regex solution to your example is the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str24 text
"-3.6133AVENAL_6_GN002"  
"-3.6133AVENAL_6_GN003"  
"-30.87814EAGLEMTN_2_N002"
"-61.2921DEVERS_1_N081"  
end

gen wanted= ustrregexra(text, "([^a-zA-Z]+)([a-zA-Z]{1}.*$)", "$2")

The syntax takes getting used to, but it is not as difficult as it looks.

Code:

. gen wanted= ustrregexra(text, "([^a-zA-Z]+)([a-zA-Z]{1}.*$)", "$2")

. l

     +--------------------------------------------+
     |                     text           wanted  |
     |--------------------------------------------|
  1. |    -3.6133AVENAL_6_GN002    AVENAL_6_GN002 |
  2. |    -3.6133AVENAL_6_GN003    AVENAL_6_GN003 |
  3. | -30.87814EAGLEMTN_2_N002   EAGLEMTN_2_N002 |
  4. |    -61.2921DEVERS_1_N081     DEVERS_1_N081 |
     +--------------------------------------------+

Announcement

Find character position of first ALPHA occurrence

Comment

Comment

Comment

Comment