Dear Stata users,
My question is about string function in Stata. Suppose we have a string variable that composed of several pieces of word that are concatenated by punctuation ("、"). And now I want to split and extract words of this variable. For example, for "Merc.、Bobcat、XR-7", I want to extract three words, "Merc.", "Bobcat" and "XR-7" and store them separately in three new variables, let's say var1, var2, var3. My method is first identifying the position of punctuation ("、") in string with the -strpos()- function, and then extracting substring with the -substr()- function. However, in Stata, the -strpos()- function can only return the position in string at which its substring is first found. So I have to replace original string variable and identify position of punctuation ("、") many times and that make my codes very complicated and bloated. I wonder if there's some more easy and efficient functions and solutions.Thank you. By the way, function -noccur- I used below is from -egenmore- (SSC).
My question is about string function in Stata. Suppose we have a string variable that composed of several pieces of word that are concatenated by punctuation ("、"). And now I want to split and extract words of this variable. For example, for "Merc.、Bobcat、XR-7", I want to extract three words, "Merc.", "Bobcat" and "XR-7" and store them separately in three new variables, let's say var1, var2, var3. My method is first identifying the position of punctuation ("、") in string with the -strpos()- function, and then extracting substring with the -substr()- function. However, in Stata, the -strpos()- function can only return the position in string at which its substring is first found. So I have to replace original string variable and identify position of punctuation ("、") many times and that make my codes very complicated and bloated. I wonder if there's some more easy and efficient functions and solutions.Thank you. By the way, function -noccur- I used below is from -egenmore- (SSC).
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str20 make "Datsun、210" "Cad.、Eldorado" "VW、Scirocco、XR-7" "Merc.、Bobcat、XR-7" "Buick" "VW、Diesel、Regal" "Pont.、Phoenix" "Merc.、Zephyr" "Olds、Delta、88" "Buick、LeSabre" "Olds、Starfire、XR-7" "Dodge、Magnum" "BMW" "AMC、Concord、320i" "Plym.、Horizon" "Datsun、510" "Toyota、Celica" "Chev.、Monte、Carlo" "Plym.、Arrow" "Datsun、200" "Mazda、GLC" "Merc.、XR-7" end
Code:
gen neomake=make egen punctuation=noccur(neomake), string("、") //number of occurrences of "、" in string variable "neomake" gen make1pos=strpos(neomake,"、") //position at which "、" is first found gen make1=substr(neomake,1,make1pos-1) if punctuation>=1 //find the first word replace make1=neomake if punctuation==0 //make1=neomake if there's no "、" and only one word replace neomake=substr(neomake,make1pos+2,.) if punctuation>=1 //replace neomake for next step gen make2pos=strpos(neomake,"、") if punctuation>=1 //position at which "、" is secondly found (in make) gen make2=substr(neomake,1,make2pos-1) if punctuation>=1 //find the second word replace make2=neomake if punctuation==1 //make2=neomake if there's only one "、" and two words replace neomake=substr(neomake,make2pos+2,.) if punctuation>=2 //find the third word gen make3=neomake if punctuation>=2 //find the third word order neomake make1 make2 make3 punctuation make1pos make2pos, after(make)
Comment