Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • inlist(): limit of 10 arguments for strings

    I'm curious to know why the inlist() function has a limit of 10 arguments for strings. Note that the number of arguments can be up to 255 for reals.

    If I have a list of more than 10 strings, what are my options? Apart from:
    -"breaking down" the list in two or more sub-lists and then using two or more inlist() functions joined by an "OR" condition;
    -writing a very long list of "OR" conditions;
    -looping over the elements of the list?
    Last edited by Andrea Discacciati; 09 Feb 2016, 09:43.

  • #2
    You can map from distinct strings to a numeric variable using egen, group() and then use its values. See also http://www.stata.com/support/faqs/da...s-for-subsets/

    Comment


    • #3
      Thank you, Nick.

      I remain puzzled by the limit of only 10 strings, however.

      Comment


      • #4
        I cannot pass up an opportunity to demonstrate an approach using regular expressions. For what it's worth, I do agree that 10 seems like a low limit.
        Code:
        . input str8 s
        
                     s
          1. gnxl
          2. gnx
          3. foo
          4. barbara
          5. end
        
        . generate m = regexm(s,"^(gnxl|foo|bar|)$")
        
        . list
        
             +-------------+
             |       s   m |
             |-------------|
          1. |    gnxl   1 |
          2. |     gnx   0 |
          3. |     foo   1 |
          4. | barbara   0 |
             +-------------+
        Last edited by William Lisowski; 09 Feb 2016, 11:13. Reason: Improved the example.

        Comment


        • #5
          Why only 10 strings?

          Only those who wrote the code at StataCorp can answer, but

          * as strings can be very, very long in Stata, some limit is not that surprising

          * perhaps this is a deliberate signal that you should be thinking about another approach if you want to specify more than a short list.

          Comment


          • #6
            I would suspect it is a combination of performance and memory management. With numeric values a sort can help to select the records fairly efficiently (e.g., selecting from two ordered lists isn't terribly demanding), but doing the same with strings (each of which requires a varying amount of memory) may not be as efficient.

            Comment


            • #7
              William Lisowski Thank you very much!

              Comment

              Working...
              X