Hi folks,
First, I'm sorry I cannot use dataex to show my data. This is because one of my variables is a long string, so dataex told me it's too large; also the observation is in Chinese, which might not make too much sense.
My data contains two variables: 1) content (which contains long strings of paragraphs of words describing court cases) and def_name (a string variable that contains the name of the defendant). I am trying to use the regular expression command (regex) to create a new variable that contains a portion of the variable content. The part I want is from the first appearance of the defendant's name to the end of the string. Basically, I want to remove everything before the name of the defendant in the variable content.
My silly way of doing this is to write a loop command that loops through all the def_name.
The problem is that this method is very slow and it gets worse as I switch to a larger dataset.
My question is: is there a more efficient way to do this? Can I ask Stata to use the value of another variable in the regular expression command?
Thank you in advance!
Adam
First, I'm sorry I cannot use dataex to show my data. This is because one of my variables is a long string, so dataex told me it's too large; also the observation is in Chinese, which might not make too much sense.
My data contains two variables: 1) content (which contains long strings of paragraphs of words describing court cases) and def_name (a string variable that contains the name of the defendant). I am trying to use the regular expression command (regex) to create a new variable that contains a portion of the variable content. The part I want is from the first appearance of the defendant's name to the end of the string. Basically, I want to remove everything before the name of the defendant in the variable content.
My silly way of doing this is to write a loop command that loops through all the def_name.
Code:
gen extract = "." levelsof def_name, local(X) foreach i of local X { quietly replace extract=regexs(0) if regexm(content,"(`i').*") & def_name == "`i'" }
My question is: is there a more efficient way to do this? Can I ask Stata to use the value of another variable in the regular expression command?
Thank you in advance!
Adam
Comment