Hi Statalisters,
I just would like to share a solution for the inlist limit of 10 string arguments, which I have not seen anywhere else here on Statalist.
There's a multitude of topics complaining about the 10 argument limit of inlist, for example here, here, here, here and here. Different solutions have been proposed and there is also inlist2 from SSC. The latter creates a dummy variable and doesn't allow comma's in the strings though.
My solution builds on Andrew Musau 's and William Lisowski suggestions in the topics above to use regexm, which works nicely in most cases but does have a limit as well. ustrregxm however appears to be limited only by the maximum length of a local macro.
The following program converts a list to a regular expression:
An example of usage is:
It also works for arguments with spaces:
Here's an example to show that it scales to at least 17576 elements, which should be more than enough for any inlist application.
I hope this can help some get around the (slightly annoying) 10 string elements limit of inlist.
P.S. I'm not advocating to use large lists for filtering, there's probably a better way to do this, I just like the idea of using something for which scaling is not a problem. And sometimes it is easier to use a list than to create a different dataset and merge.
I just would like to share a solution for the inlist limit of 10 string arguments, which I have not seen anywhere else here on Statalist.
There's a multitude of topics complaining about the 10 argument limit of inlist, for example here, here, here, here and here. Different solutions have been proposed and there is also inlist2 from SSC. The latter creates a dummy variable and doesn't allow comma's in the strings though.
My solution builds on Andrew Musau 's and William Lisowski suggestions in the topics above to use regexm, which works nicely in most cases but does have a limit as well. ustrregxm however appears to be limited only by the maximum length of a local macro.
The following program converts a list to a regular expression:
Code:
program list_to_regex args list mata: st_local("regex", "^(" + invtokens(tokens(st_local("list")), "|") + ")$") c_local regex "`regex'" end
Code:
list_to_regex "US BE JP" keep if ustrregexm(country, "`regex'")
Code:
sysuse auto, clear list_to_regex `"`"Audi 5000"' `"Audi Fox"'"' keep if ustrregexm(make, "`regex'")
Code:
clear set obs `=26^3 + 1' gen foo = "" local i 0 foreach a in `c(alpha)' { foreach b in `c(alpha)' { foreach c in `c(alpha)' { local ++i replace foo = "`a'`b'`c'" in `i' local list `list' `a'`b'`c' } } } replace foo = "abcd" if missing(foo) list_to_regex "`list'" drop if ustrregexm(foo, "`regex'") list
P.S. I'm not advocating to use large lists for filtering, there's probably a better way to do this, I just like the idea of using something for which scaling is not a problem. And sometimes it is easier to use a list than to create a different dataset and merge.
Comment