Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternative to inlist() - expression too long

    Hi,

    Relatively basic question: I am looking for a concise (1 line) alternative to inlist that would accept more arguments. Is there an "ssc install" that would fix this inlist() limitation?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 gvkey str2 linktype
    "001000" "NU"
    "001000" "NU"
    "001000" "LU"
    "001001" "NU"
    "001001" "LU"
    "001002" "NR"
    "001002" "NR"
    "001002" "NR"
    "001002" "LC"
    "001003" "NU"
    "001003" "NU"
    "001003" "LU"
    "001004" "NU"
    "001004" "NU"
    "001004" "LU"
    "001005" "NU"
    "001005" "LU"
    "001007" "LU"
    "001007" "LU"
    "001007" "NU"
    "001008" "NR"
    "001008" "LC"
    "001009" "NR"
    "001009" "NR"
    "001009" "NR"
    "001009" "LC"
    "001010" "LU"
    "001010" "LU"
    "001010" "NU"
    "001011" "NR"
    "001011" "NR"
    "001011" "LC"
    "001012" "NU"
    "001012" "LU"
    "001012" "NU"
    "001013" "NU"
    "001013" "LU"
    "001015" "NU"
    "001015" "LU"
    "001015" "NU"
    "001016" "NR"
    "001016" "NR"
    "001016" "LC"
    "001017" "NR"
    "001017" "NR"
    "001017" "LC"
    "001018" "LU"
    "001018" "NU"
    "001018" "LU"
    "001018" "NU"
    "001019" "NR"
    "001019" "NR"
    "001019" "NU"
    "001019" "LC"
    "001020" "NR"
    "001020" "NR"
    "001020" "NR"
    "001020" "LC"
    "001020" "NR"
    "001021" "NU"
    "001021" "NU"
    "001021" "LU"
    "001022" "LU"
    "001022" "NU"
    "001022" "LU"
    "001023" "NU"
    "001023" "NU"
    "001023" "LU"
    "001024" "NU"
    "001024" "LU"
    "001025" "NU"
    "001025" "LU"
    "001026" "NU"
    "001026" "NU"
    "001026" "LU"
    "001027" "NU"
    "001027" "LU"
    "001028" "NU"
    "001028" "LU"
    "001029" "NU"
    "001029" "LU"
    "001030" "NU"
    "001030" "LU"
    "001031" "LU"
    "001031" "LU"
    "001034" "NR"
    "001034" "LC"
    "001036" "NU"
    "001036" "NU"
    "001036" "LC"
    "001036" "NR"
    "001036" "LX"
    "001037" "NU"
    "001037" "NU"
    "001037" "LU"
    "001038" "NU"
    "001038" "LU"
    "001038" "NU"
    "001039" "NR"
    "001039" "NR"
    end
    Code:
    keep if inlist(linktype,"LU","LC","LU", "LC", "LD", "LF", "LN", "LO", "LS", "LX")

  • #2
    I understand I could break it down as follow but I am wondering if there is a better solution for long list of arguments:
    Code:
    keep if (inlist(linktype,"LU","LC","LU", "LC", "LD") | inlist(linktype, "LF", "LN", "LO", "LS", "LX"))

    Comment


    • #3
      In one line? Doubtful. With a small list, there’s nothing wrong with stringing together some inlist() conditions, and certainly is quite readable which is also important. If you have an unmanageable amount to type by hand, then this stops being efficient from a programming perspective.

      Comment


      • #4
        Holding categorical variables as string values is generally not a good idea in Stata, and leads only to pain and suffering.

        You can firstly generate a nicely labelled categorical variable with numerical values

        Code:
        .  encode linktype , generate(numlink) label(name)
        
        . list in 1/5
        
             +-----------------------------+
             |  gvkey   linktype   numlink |
             |-----------------------------|
          1. | 001000         NU        NU |
          2. | 001000         NU        NU |
          3. | 001000         LU        LU |
          4. | 001001         NU        NU |
          5. | 001001         LU        LU |
             +-----------------------------+
        
        . list in 1/5, nol
        
             +-----------------------------+
             |  gvkey   linktype   numlink |
             |-----------------------------|
          1. | 001000         NU         5 |
          2. | 001000         NU         5 |
          3. | 001000         LU         2 |
          4. | 001001         NU         5 |
          5. | 001001         LU         2 |
             +-----------------------------+
        and then you relax the -inlist- limit from 10 to 250.

        Comment


        • #5
          Once you switch to numerical values of the linktype, you can also switch from -inlist- to -inrange-, which is convenient if what you want to keep are nearby numbers (which they seem to be, what you re putting in the inlist starts with L).

          Comment


          • #6
            Interesting Joro! Can you still manage (keep or drop) numlink observations using their label name instead of their actual value? It would help keeping the code readable. Thanks!

            Comment


            • #7
              Yes, there is way to refer to values by their labels. Whether this will make your life any easier, I dont know:

              Code:
              . list if numlink == "LX":name
              
                   +-----------------------------+
                   |  gvkey   linktype   numlink |
                   |-----------------------------|
               40. | 001036         LX        LX |
                   +-----------------------------+
              
              . list if inlist(numlink, "LX":name, "LC":name)
              
                   +-----------------------------+
                   |  gvkey   linktype   numlink |
                   |-----------------------------|
                1. | 001009         LC        LC |
                2. | 001002         LC        LC |
                3. | 001016         LC        LC |
                4. | 001034         LC        LC |
                5. | 001036         LC        LC |
                   |-----------------------------|
                6. | 001017         LC        LC |
                7. | 001011         LC        LC |
                8. | 001008         LC        LC |
                9. | 001020         LC        LC |
               10. | 001019         LC        LC |
                   |-----------------------------|
               40. | 001036         LX        LX |
                   +-----------------------------+
              I am curious myself, if you refer to numeric variables by their labels, which limit of -inlist- would apply? The 10 for strings, or the 250 for number?

              If you every try what I showed you above, do report back whether we managed to extend the -inlist- limit from 10 to 250.

              Originally posted by Francois Durant View Post
              Interesting Joro! Can you still manage (keep or drop) numlink observations using their label name instead of their actual value? It would help keeping the code readable. Thanks!

              Comment


              • #8
                Note that

                Code:
                 keep if inlist(linktype,"LU","LC","LU", "LC", "LD", "LF", "LN", "LO", "LS", "LX")
                is longer than needed given the repetitions.

                Code:
                 keep if inlist(linktype,"LU","LC", "LD", "LF", "LN", "LO", "LS", "LX")
                should work fine, In this particular case I wonder if the condition

                Code:
                if substr(linktype, 1, 1) == "L"
                would also do what is wanted.

                However, to answer a question in #1, inlist() is a function, not a command, and functions can't be user-written, so nothing on SSC (or the Stata Journal, or anywhere else) can supply a better or different version. The functionality [indeed] has to be sought other ways. For example


                Code:
                local OK  LU LC LD LF LN M LO LS LX  
                
                gen byte wanted = 0
                quietly foreach ok of local OK {
                       replace wanted = 1 if linktype = "`ok'"
                } 
                is technique that can be extended. Another technique that is often better is have the acceptable codes in a different dataset and merge (not to mentions frames in Stata 16 up).

                Comment


                • #9
                  regexm allows a longer, although by no means unlimited, list.

                  Code:
                  keep if regexm(linktype, "(LU|LC|LU|LC|LD|LF|LN|LO|LS|LX)")

                  Comment


                  • #10
                    My personal preference is to use the merge-type pattern when I have a list of more than about a dozen items (arbitrarily chosen) to match. The advantage is that it can be extended to any number of observations, combinations of variables, and variable type. It could also then be written to and loaded from an external file if the matching is intended to be more dynamic.

                    To show a comparison between Nick's method and the merge (with frames) method suggested by Nick.

                    Code:
                    clear *
                    cls
                    
                    input str2 linktype
                    "NU"
                    "LU"
                    "NR"
                    "LC"
                    "NU"
                    "LX"
                    "NR"
                    end
                    
                    local OK  LU LC LD LF LN LO LS LX  
                    
                    gen byte wanted = 0
                    quietly foreach ok of local OK {
                      replace wanted = 1 if linktype == "`ok'"
                    }
                    
                    
                    frame create OK
                    frame OK {
                    input str2 linktype
                    "LU"
                    "LC"
                    "LD"
                    "LF"
                    "LN"
                    "LO"
                    "LS"
                    "LX"
                    end
                    }
                    
                    frlink m:1 linktype, frame(OK) gen(lok)
                    gen byte wanted2 = !mi(lok)
                    list linktype wanted*
                    Results in

                    Code:
                         +-----------------------------+
                         | linktype   wanted   wanted2 |
                         |-----------------------------|
                      1. |       NU        0         0 |
                      2. |       LU        1         1 |
                      3. |       NR        0         0 |
                      4. |       LC        1         1 |
                      5. |       NU        0         0 |
                         |-----------------------------|
                      6. |       LX        1         1 |
                      7. |       NR        0         0 |
                         +-----------------------------+

                    Comment


                    • #11
                      Thank you Joro, Leonardo, Nick and Andrew for providing diverse and efficient ways to deal with the problem, I really appreciate it!

                      Comment


                      • #12
                        Old thread, but for future readers, you could use inlist2, in two lines.
                        Code:
                        ssc install inlist2
                        inlist2 linktype, val(LU,LC,LU,LC,LD,LF,LN,LO,LS,LX)
                        keep if inlist2==1

                        Comment


                        • #13
                          I have tried inlist2 in another problem but it seems to keep only the last string written in the parenthesis with the following syntax. How can I amend? Thanks.

                          Code:
                          inlist2 country, val( Australia  , Austria  , Belgium  , Canada  , Chile   , China,P.R.: Mainland  , Colombia ,  Czech Republic  , Denmark ,  Estonia  , Euro ,  Finland  , France  , Germany  , Greece ,  Hungary ,  Iceland  , Indonesia  , Ireland ,  Israel ,  Italy  , Japan  , Korea  , Latvia  , Luxemburg  , Mexico ,  Netherlands ,  New Zealand ,  Norway  , Poland  , Poland  , Russia  , Slovak Republic ,  Slovenia  , South Africa ,  Spain  , Sweden  , Switzerland ,  United Kingdom ,  United States )
                          keep if inlist2==1

                          Comment


                          • #14
                            You have already a number of good suggestions above.
                            First and foremost the advice of Joro Kolev to switch to categorical variables.
                            Also from Andrew Musau with a regular expression.
                            Here is a variant without regular expressions.
                            Mind the leading and trailing delimiters in the ok definition!

                            Code:
                            local ok " LU LC LD LF LN M LO LS LX "
                            keep if strpos("`ok'"," "+linktype+" ")>0

                            Comment


                            • #15
                              I know how to solve the problem but I would like to understand the syntax of inlist2 for future reference. Thanks Sergiy!

                              Comment

                              Working...
                              X