Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Solution for inlist - expression too long

    Hi Statalisters,

    I just would like to share a solution for the inlist limit of 10 string arguments, which I have not seen anywhere else here on Statalist.

    There's a multitude of topics complaining about the 10 argument limit of inlist, for example here, here, here, here and here. Different solutions have been proposed and there is also inlist2 from SSC. The latter creates a dummy variable and doesn't allow comma's in the strings though.

    My solution builds on Andrew Musau 's and William Lisowski suggestions in the topics above to use regexm, which works nicely in most cases but does have a limit as well. ustrregxm however appears to be limited only by the maximum length of a local macro.

    The following program converts a list to a regular expression:
    Code:
    program list_to_regex
        args list
        mata: st_local("regex", "^(" + invtokens(tokens(st_local("list")), "|") + ")$")
        c_local regex "`regex'"
    end
    An example of usage is:
    Code:
    list_to_regex "US BE JP"
    keep if ustrregexm(country, "`regex'")
    It also works for arguments with spaces:
    Code:
    sysuse auto, clear
    list_to_regex `"`"Audi 5000"' `"Audi Fox"'"'
    keep if ustrregexm(make, "`regex'")
    Here's an example to show that it scales to at least 17576 elements, which should be more than enough for any inlist application.
    Code:
    clear
    set obs `=26^3 + 1'
    gen foo = ""
    local i 0
    foreach a in `c(alpha)' {
        foreach b in `c(alpha)' {
            foreach c in `c(alpha)' {
                local ++i
                replace foo = "`a'`b'`c'" in `i'
                local list `list' `a'`b'`c'
            }
        }
    }
    
    replace foo = "abcd" if missing(foo)
    list_to_regex "`list'"
    drop if ustrregexm(foo, "`regex'")
    list
    I hope this can help some get around the (slightly annoying) 10 string elements limit of inlist.

    P.S. I'm not advocating to use large lists for filtering, there's probably a better way to do this, I just like the idea of using something for which scaling is not a problem. And sometimes it is easier to use a list than to create a different dataset and merge.

  • #2
    Hi Wouter! Nice program!! The issue I see is that here as well, when you want to have strings with spaces, the code becomes too annoying to write, so one might as well link several inlist through an "|". Both in your program and mine I think a solution in that case is to replace all commas (in inlist2) or spaces here, with a symbol not used by any string e.g. a "@" and then change it back, but it's even more lines. If there is a mix of spaces and commas, neither of our programs work I guess.

    Comment


    • #3
      Hi Matteo, indeed writing out a long list of strings with spaces surrounded by double quotes is not ideal, although it can be done programmatically if the list is constructed from some other source, which is more what I had in mind.

      I guess the ideal solution would be for StataCorp to loosen the strict requirement of 10 elements, which seems a bit on the low side for me. But probably there are some issues with that under the hood that I'm not aware of.

      Comment


      • #4
        That would be indeed ideal, hopefully those mysterious issues will be solved at some point!

        Comment

        Working...
        X