Solution for inlist - expression too long

Wouter Wakker

Join Date: Nov 2018

Posts: 621
#1

Solution for inlist - expression too long

16 Sep 2021, 05:32

Hi Statalisters,

I just would like to share a solution for the inlist limit of 10 string arguments, which I have not seen anywhere else here on Statalist.

There's a multitude of topics complaining about the 10 argument limit of inlist, for example here, here, here, here and here. Different solutions have been proposed and there is also inlist2 from SSC. The latter creates a dummy variable and doesn't allow comma's in the strings though.

My solution builds on Andrew Musau 's and William Lisowski suggestions in the topics above to use regexm, which works nicely in most cases but does have a limit as well. ustrregxm however appears to be limited only by the maximum length of a local macro.

The following program converts a list to a regular expression:

Code:

program list_to_regex args list mata: st_local("regex", "^(" + invtokens(tokens(st_local("list")), "|") + ")$") c_local regex "`regex'" end

An example of usage is:

Code:

list_to_regex "US BE JP" keep if ustrregexm(country, "`regex'")

It also works for arguments with spaces:

Code:

sysuse auto, clear list_to_regex `"`"Audi 5000"' `"Audi Fox"'"' keep if ustrregexm(make, "`regex'")

Here's an example to show that it scales to at least 17576 elements, which should be more than enough for any inlist application.

Code:

clear set obs `=26^3 + 1' gen foo = "" local i 0 foreach a in `c(alpha)' { foreach b in `c(alpha)' { foreach c in `c(alpha)' { local ++i replace foo = "`a'`b'`c'" in `i' local list `list' `a'`b'`c' } } } replace foo = "abcd" if missing(foo) list_to_regex "`list'" drop if ustrregexm(foo, "`regex'") list

I hope this can help some get around the (slightly annoying) 10 string elements limit of inlist.

P.S. I'm not advocating to use large lists for filtering, there's probably a better way to do this, I just like the idea of using something for which scaling is not a problem. And sometimes it is easier to use a list than to create a different dataset and merge.
Tags: None

4 likes
Matteo Pinna

Join Date: Oct 2020

Posts: 11
#2

01 Feb 2022, 07:16

Hi Wouter! Nice program!! The issue I see is that here as well, when you want to have strings with spaces, the code becomes too annoying to write, so one might as well link several inlist through an "|". Both in your program and mine I think a solution in that case is to replace all commas (in inlist2) or spaces here, with a symbol not used by any string e.g. a "@" and then change it back, but it's even more lines. If there is a mix of spaces and commas, neither of our programs work I guess.
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#3

01 Feb 2022, 09:56

Hi Matteo, indeed writing out a long list of strings with spaces surrounded by double quotes is not ideal, although it can be done programmatically if the list is constructed from some other source, which is more what I had in mind.

I guess the ideal solution would be for StataCorp to loosen the strict requirement of 10 elements, which seems a bit on the low side for me. But probably there are some issues with that under the hood that I'm not aware of.
1 like
Comment
Matteo Pinna

Join Date: Oct 2020

Posts: 11
#4

04 Feb 2022, 03:42

That would be indeed ideal, hopefully those mysterious issues will be solved at some point!
Comment

Announcement

Solution for inlist - expression too long

Comment

Comment

Comment