Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expression too long while using inlist

    Hello All,

    I am learning STATA and have not been able to find the answer to the following question online:

    What syntax do I use in Stata to generate a variable that requires multiple conditions? Here is what I'm trying to use and it's not working:


    Q) Need help with getting the variables that have the given entries I have 50 variables and more then 150 data entries to fetch from the dataset kindly help me with it


    browse * if inrange(I10_DX1,"F1110","F1199") |inrange(I10_DX2,"F1110","F1199") |inrange(I10_DX3,"F1110","F1199") |inrange(I10_DX4,"F1110","F1199") |inrange(I10_DX5,"F1110","F1199") |inrange(I10_DX6,"F1110","F1199")|inrange(I10_DX7, "T400X1A","T40695D") |inrange(I10_DX8,"F1110","F1199") |inrange(I10_DX9,"F1110","F1199") |inrange(I10_DX10,"F1110","F1199") |inrange(I10_DX11,"F1110","F1199") |inrange(I10_DX12,"F1110","F1199") |inrange(I10_DX13,"F1110","F1199") |inrange(I10_DX14,"F1110","F1199") |inrange(I10_DX15,"F1110","F1199") |inrange(I10_DX16,"F1110","F1199") |inrange(I10_DX17,"F1110","F1199") |inrange(I10_DX18,"F1110","F1199") |inrange(I10_DX19,"F1110","F1199") |inrange(I10_DX20,"F1110","F1199") |inrange(I10_DX21,"F1110","F1199") |inrange(I10_DX22,"F1110","F1199") |inrange(I10_DX23,"F1110","F1199") |inrange(I10_DX24,"F1110","F1199") |inrange(I10_DX25,"F1110","F1199") |inrange(I10_DX26,"F1110","F1199") |inrange(I10_DX27,"F1110","F1199") |inrange(I10_DX28,"F1110","F1199") |inrange(I10_DX29,"F1110","F1199") |inrange(I10_DX30,"F1110","F1199")| inrange(I10_DX1,"T400X1A","T40695D")| inrange(I10_DX2,"T400X1A","T40695D")| inrange(I10_DX3,"T400X1A","T40695D")| inrange(I10_DX4,"T400X1A","T40695D")| inrange(I10_DX5,"T400X1A","T40695D")| inrange(I10_DX6,"T400X1A","T40695D")| inrange(I10_DX7,"T400X1A","T40695D")| inrange(I10_DX8,"T400X1A","T40695D")| inrange(I10_DX9,"T400X1A","T40695D")| inrange(I10_DX10,"T400X1A","T40695D")| inrange(I10_DX11,"T400X1A","T40695D")| inrange(I10_DX12,"T400X1A","T40695D")| inrange(I10_DX13,"T400X1A","T40695D")| inrange(I10_DX14,"T400X1A","T40695D")| inrange(I10_DX15,"T400X1A","T40695D")| inrange(I10_DX16,"T400X1A","T40695D")| inrange(I10_DX17,"T400X1A","T40695D")| inrange(I10_DX18,"T400X1A","T40695D")| inrange(I10_DX19,"T400X1A","T40695D")| inrange(I10_DX20,"T400X1A","T40695D")| inrange(I10_DX21,"T400X1A","T40695D")| inrange(I10_DX22,"T400X1A","T40695D")| inrange(I10_DX23,"T400X1A","T40695D")| inrange(I10_DX24,"T400X1A","T40695D")| inrange(I10_DX25,"T400X1A","T40695D")| inrange(I10_DX26,"T400X1A","T40695D")| inrange(I10_DX27,"T400X1A","T40695D")| inrange(I10_DX28,"T400X1A","T40695D")| inrange(I10_DX129,"T400X1A","T40695D")| inrange(I10_DX30,"T400X1A","T40695D")



    Q) While using inlist function, I am able to give only 10 inputs. but I have more than 100. Can you please help me.

    forvalues j=1/30{
    browse * if inlist(I10_DX`j', "F1110", "F11120", "F11121", "F11122", "F11129", "F1114", "F11150", "F11151", "F11159", "F11181", "F11182", "F11188 ", "F1119", "F1120", "F11220", "F11221 ", "F11222", "F11229", "F1123 ", "F1124 ", "F11250 ", "F11251", "F11259", "F11281", "F11282", "F11288", "F1129", "F1190", "F11920 ", "F11921 ", "F11922 ", "F11929 ", "F1193", "F1194 ", "F11950", "F11951 ", "F11959 ", "F11981 ", "F11982 ", "F11988 ", "F1199", "T400X1A ", "T400X1D ", "T400X2A ", "T400X2D ", "T400X3A ", "T400X3D ", "T400X4A ", "T400X4D ", "T400X5A ", "T400X5D ", "T401X1A ", "T401X1D ", "T401X2A ", "T401X2D ", "T401X3A ", "T401X3D ", "T401X4A ", "T401X4D ", "T402X1A ", "T402X1D", "T402X2A", "T402X2D ", "T402X3A ", "T402X3D ", "T402X4A ", "T402X4D ", "T402X5A ", "T402X5D", "T403X1A ", "T403X1D", "T403X2A ", "T403X2D", "T403X3A ", "T403D3D", "T403X4A ", "T403X4D", "T403X5A ", "T403X5D", "T404X1A ", "T404X1D ", "T404X2A ", "T404X2D", "T404X3A ", "T404X3D", "T404X4A ", "T404X4D", "T404X5A ", "T404X5D ", "T40601A ", "T40601D", "T40602A ", "T40602D ", "T40603A ", "T40603D ", "T40604A", "T40604D", "T40605A ", "T40605D ", "T40691A ", "T40691D", "T40692A ", "T40692D", "T40693A ", "T40693D", "T40694A ", "T40694D", "T40695A ", "T40695D")
    }


    ERROR:
    expression too long
    r(130);

    Any help would be greatly appreciated,
    Thanks in advance

  • #2
    There doesn't seem to be an pattern in the values of the strings you wish to use as arguments to -inlist()- so there is no simplification available that way. I'm afraid the best you can do is break it up into a string of -inlist(....)- expressions conjoined by the | operator, with each -inlist(...)- term containing no more than 10 arguments. It will look more or less like the first long command you show using -inrange()-. That is probably the simplest way to go.

    Another possibility, that might be easier or you might find more complicated is this. Create a new Stata data set with just a single variable, that variable containing all and only those values that you want to use in your -inlist()- argument. Anyway, let's say you call that variable linkvar, and call that data set match_values.dta Then you can do this:

    Code:
    frame create matches
    frame matches: use match_values
    frlink m:1 I10_DX`j', frame(match_values linkvar)
    browse if !missing(match_values)
    This solution is only possible in Stata 16 or later as it relies on frames. Something analogous could be done using a tempfile in earlier versions of Stata.

    Comment


    • #3
      Hello sir,

      Thank you so much. I really appreciate it.

      But I am not able to get the result.(got only partial output).


      frame create matches
      frame matches: use match_values
      frlink m:1 I10_DX1 frame(matches linkvar)
      browse if !missing(matches)

      for the above code I got the result only for 10_DX1 variable.

      Required output:
      I have more then 30 variables such as I10_DX1,I10_DX2, I10_DX3 ,I10_DX4 ,I10_DX5, I10_DX6 , I10_DX7 ,I10_DX8 ,I10_DX9 etc.............I10_DX30. For all these variables have to compare with other variable (linkvar) which is in other data set(match_values.dta) as you suggested.

      so now in data set 1 (say data1.dta) has variables I10_DX1,I10_DX2, I10_DX3 ,I10_DX4 ................I10_DX30 and in other data set (say match_values.dta) has only one variable (linkvar) which i have to compare with all the other variable in data set 1(data1.dta)
      .
      I10_DX1 has this data :L03116 ,O99511 , Z3800, Z3801 , Z3800 , O480 , Z3801 , Z3801 , Z3800 , F1123 , J210 etc.............
      I10_DX2 has this data :Z6844 , J45909 , Z23 , Z23 , P551 , O9081 , P031 , P769 , P700 , , , P03811 , Z22330 , P284 etc...........
      ........

      I10_DX30
      linkvar has this data: F1110 , F11120 , F11121 , F11122 , F11129 , F1114 , F11150 , F11151 , F11159 , F11181 F11182 F11188 F1119 F1120 F11220 F11221 F11222 F11229 F1123 F1124 F11250 F11251 F11259



      I tried this:

      frame create matches
      frame matches: use match_values
      forvalues j=1/30{
      frlink m:1 I10_DX`j', frame(matches linkvar)
      }
      browse if !missing(matches)

      Error message:

      m:1 I10_DX1 invalid frlink subcommand
      r(197);
      I tried this too:

      frame create matches
      frame matches: use match_values
      frlink m:1 I10_DX1 I10_DX2 I10_DX3 I10_DX4 I10_DX5 I10_DX6 , frame(matches linkvar)
      browse if !missing(matches)


      Error message:

      variables misspecified
      You specified 6 variables after the frlink command, but 1 variable in the frame() option. There must be a
      one-to-one correspondence between the two variable lists.
      r(198);


      Any leads will be appreciated.
      Thank you in advance.

      Comment


      • #4
        I'm not sure I understand #3. It seems that your looping over a family of variables I10_DX`j', with j ranging from 1 through 30. Is that correct. In that case, you have to break the link and re-establish it each time, and you also have to keep track of what matches so that in the end you can browse all the observations that match any of the ID10_DX* variables.

        Code:
        frame create matches
        frame matches: use match_values
        sort linkvar // NOT ESSENTIAL, BUT MAY SPEED EXECUTION A BIT
        gen to_browse = 0
        forvalues j = 1/30 {
            // MAYBE SOME OTHER CODE GOES HERE FIRST
            frlink m:1 I10_DX`j', frame(matches linkvar)
            replace to_browse = to_browse | !missing(matches)
            drop matches
            // MAYBE SOME OTHER CODE AFTER
        }
        browse if to_browse
        Now, here's the part I'm not sure I understand. When you write
        I10_DX1 has this data :L03116 ,O99511 , Z3800, Z3801 , Z3800 , O480 , Z3801 , Z3801 , Z3800 , F1123 , J210 etc.............
        I10_DX2 has this data :Z6844 , J45909 , Z23 , Z23 , P551 , O9081 , P031 , P769 , P700 , , , P03811 , Z22330 , P284 etc...........
        ........

        I10_DX30
        linkvar has this data: F1110 , F11120 , F11121 , F11122 , F11129 , F1114 , F11150 , F11151 , F11159 , F11181 F11182 F11188 F1119 F1120 F11220 F11221 F11222 F11229 F1123 F1124 F11250 F11251 F11259
        it leads me to think each of the I10_DX variables has its own separate list of things it is suppose to match with. So maybe if I10_DX30 matches with L03116, you don't want to include it in the browse. Does each of the I10_DX* variables have its own separate list of things to match with? The code I have shown does not make that kind of distinction: it will cause the browser to show you those observations where any of the I10_DX* variables matches with any of the values that appear in that match_values data set. So if this is not what you want please post back. But in that case, please include example data, posted using the -dataex- command.

        If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
        Last edited by Clyde Schechter; 26 Jul 2021, 17:57.

        Comment


        • #5
          Hello sir,

          thank you very much for your detailed answer. I really appreciate your help....

          Comment


          • #6
            An alternative to inlist() could be ustrpos():
            Code:
            scalar vallist = `" "F1110", "F11120", "F11121", "...", "T40694A ", "T40694D", "T40695A ", "T40695D" "'
             
            clear
            set obs 2
            gen I10_DX1 = "T40694A " in 1 
            gen flag =  ustrpos(vallist, char(34) + I10_DX1 + char(34) ) > 0 
            list
            Code:
                 +-----------------+
                 |  I10_DX1   flag |
                 |-----------------|
              1. | T40694A       1 |
              2. |               0 |
                 +-----------------+

            Comment

            Working...
            X