Code:
input str100 result str100 f "PR: L63P A71T V77I" "L63P A71T V77I" "RT: A98S K104R E122K I135V D177E T200A Q207E R211K L214F V245M" "A98S K104R E122K I135V D177E T200A Q207E R211K" "PR: E35ED S37N R41K I72L" "E35ED S37N R41K I72L" "ATV Mutations: A71T" "A71T" "ATV/r Mutations: L63P A71T V77I" "L63P A71T V77I" "DRV/r Mutations: L63P A71T V77I" "L63P A71T V77I" "AMP Mutations: A71T" "A71T" "AMP/r Mutations: A71T" "A71T" "IDV Mutations: A71T V77I" "A71T V77I" "IDV/r Mutations: A71T V77I" "A71T V77I" "LPV/r Mutations: L63P A71T" "L63P A71T" "NFV Mutations: A71T" "A71T" "SQV/r Mutations: A71T V77I" "A71T V77I" "Protease: L63P A71T/A" "L63P A71T" "PR: L63P V77I" "L63P V77I" "RT: E122K D123E I178L G196E T200I L214F V245E" "E122K D123E I178L G196E T200I L214F V245E" end gen f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)")) replace f = regexs(0) if (regexm(result, "([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+) ([A-Z][0-9]+[A-Z]+)"))
I am facing two issues.
I have a pattern ([A-Z][0-9]+[A-Z]+) and another one ([A-Z][0-9]+[A-Z]+/[A-Z]+)
I do not know if regexm or moss can be used to return more than one pattern at a time, all instances of such.
The regexm series I used is a very inefficient way to extract all instances of only ONE pattern, in addition there is a limit, I believe. The maximum such sequences number 13 in my real data, while regexm may not go beyond 10? The error returned is "regexp: too many ()"
In observation #14 I would like the command to return L63P and A71T/A, in other words, two patterns need to be specified for the command to look for.
Comment