How to find particular word in string in stata

Tsvetan Georgiev

Join Date: Jun 2015

Posts: 11
#1

How to find particular word in string in stata

04 Jun 2015, 10:20

Hello together,
Is there a command in Stata which to search in string variable for a particular word and to return only this word. One example: When I use functions: -regexm-, -strpos- or -strmatch- and I am searching for "INC" only Stata return all observation that contain "INC" like INCOME or something else, but I need only observations with "INC"
Thanks in advance
Tags: Extraction, stata, string

Clyde Schechter

Join Date: Apr 2014
Posts: 30097

04 Jun 2015, 10:42

How about

Code:

list stringvar if strops(stringvar, " INC ") | substr(stringvar, 1, 4) == "INC " | substr(stringvar, -4, 4) == " INC"

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#3

04 Jun 2015, 11:09

the question is not completely clear to me; is "INC" short for "incorporated"? if yes, will it sometimes be immediately followed by a period? if yes, Clyde's code will not work; if there is sometimes a period and sometimes not, I would just add additional conditions to Clyde's that include the period
Comment
Tsvetan Georgiev

Join Date: Jun 2015

Posts: 11
#4

04 Jun 2015, 11:18

Yes INC is short for incorporated. and sometimes there is a period at the end and sometimes not. Can i use the same code for whole words like "TEAM" for example?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#5

04 Jun 2015, 11:22

yes - if Clyde's code is not clear to you, check the help files for the functions he uses (e.g., strpos (he has a typo in his code) and substr)
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#6

04 Jun 2015, 11:34

note that all of the above assumes that your text really is all capitals - if not, you can either make a more complicated statement or, and I suggest this, use the "upper" function before using the suggested code
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#7

04 Jun 2015, 12:02

When searching for text on whole word boundaries, I usually avoid the start and end of string corner cases by adding a space at each end. Something like

Code:

gen s = " " + stringvar + " " list if strpos(s," INC ")

I also find listsome (from SSC) useful for this type of work.
1 like
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

04 Jun 2015, 12:04

Not to give up too soon on regular expressions.

Code:

clear
input str10 corp
"INC       "
"INC.      "
"INCOME    "
" INC      "
" INC.     "
"ZINC      "
" INCOME   "
"       INC"
"      INC."
"      ZINC"
end
generate m = regexm(corp,"^INC[. ]| INC[. ]| INC[.]?$")
list, clean

Code:

             corp   m  
  1.   INC          1  
  2.   INC.         1  
  3.   INCOME       0  
  4.    INC         1  
  5.    INC.        1  
  6.   ZINC         0  
  7.    INCOME      0  
  8.          INC   1  
  9.         INC.   1  
 10.         ZINC   0

Comment

Tsvetan Georgiev

Join Date: Jun 2015

Posts: 11
#9

04 Jun 2015, 13:20

Thanks a lot all of you. Both -regexm- and -strpos- work perfectly.
Comment

Announcement

How to find particular word in string in stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment