Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is wrong with inlist2 in my code?

    Hi all,

    Yesterday, as documented from this post (https://www.statalist.org/forums/for...p-of-countries) , I try to find a way to test my code by using inlist for firms in more than 10 countries as below and it seems not correct

    reghdfe y x if inlist(GEOGN, "CHINA" "UNITEDS" "INDONESIA" "RUSSIAN" "MEXICO" "JAPAN" "PHILIPPINES" "VIETNAM" "SOUTHKOREA") | inlist(GEOGN,"COLOMBIA" "CANADA" "PERU" "MALAYSIA" "AUSTRALIA" "CHILE" "ECUADOR" "SINGAPORE" "NEWZEALAND"), a(TYPE2 INDC32#yr)

    I am search and found that we also have another user-written code named inlist2 without restriction regarding the number of countries included. Therefore, I install inlist2 and run the code but STATA returns an error

    reghdfe y x if inlist2(GEOGN, CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA CHILE ECUADOR SINGAPORE NEWZEALAND), a(TYPE2 INDC32#yr)

    unknown function inlist2()

    r(133);


    (I also test with the quotation marks for each country).

    Could you please help me to identify what is the problem?

    Thanks
    Last edited by Phuc Nguyen; 30 Aug 2021, 17:37.

  • #2
    I can't speak for inlist2, but if there are so many countries to screen, I'd suggest creating a filter at the beginning and use the filter. That way, you don't have to repeatedly list those countries again and again if you have multiple regression models. Here is a sample to create a binary indicator called "include":

    Code:
    gen include = 0
    foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES ///
                    VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA ///
                    CHILE ECUADOR SINGAPORE NEWZEALAND{
        replace include = 1 if GEOGN == "`ctry'"  
    }
    Then you can just use this to run the regression:

    Code:
    reghdfe y x if include == 1, a(TYPE2 INDC32#yr)

    Comment


    • #3
      Originally posted by Ken Chui View Post
      I can't speak for inlist2, but if there are so many countries to screen, I'd suggest creating a filter at the beginning and use the filter. That way, you don't have to repeatedly list those countries again and again if you have multiple regression models. Here is a sample to create a binary indicator called "include":

      Code:
      gen include = 0
      foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES ///
      VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA ///
      CHILE ECUADOR SINGAPORE NEWZEALAND{
      replace include = 1 if GEOGN == "`ctry'"
      }
      Then you can just use this to run the regression:

      Code:
      reghdfe y x if include == 1, a(TYPE2 INDC32#yr)
      It is a nice approach, Ken Chui . Thanks a heap, could you have a look on that thread having similar problem tho (https://www.statalist.org/forums/for...-sample-size)?

      Comment


      • #4
        An alternative approach to #2, based on the same principle of creating a filter but without the loop, is to produce another dataset at the outset containing the list of countries you're interested in, and then merging it with your existing dataset:

        Code:
        clear
        input str15 GEOGN
        "CHINA"
        "UNITEDS"
        "INDONESIA"
        ...
        "MALAYSIA"
        "AUSTRALIA"
        end
        merge 1:m GEOGN using yourdataset.dta
        reghdfe y x if _merge==3, a(TYPE2 INDC32#yr)

        Comment


        • #5
          Originally posted by Ken Chui View Post
          I can't speak for inlist2, but if there are so many countries to screen, I'd suggest creating a filter at the beginning and use the filter. That way, you don't have to repeatedly list those countries again and again if you have multiple regression models. Here is a sample to create a binary indicator called "include":

          Code:
          gen include = 0
          foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES ///
          VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA ///
          CHILE ECUADOR SINGAPORE NEWZEALAND{
          replace include = 1 if GEOGN == "`ctry'"
          }
          Then you can just use this to run the regression:

          Code:
          reghdfe y x if include == 1, a(TYPE2 INDC32#yr)
          Hi Ken Chui , is there any limitation regarding the element for each row in foreach statement then? I saw you document 7 countries per row before ///. I am not sure is it a coincidence or on purpose

          Comment


          • #6
            inlist2 is a community-contributed command from SSC. It is not an official function and can't be called like one.

            Comment


            • #7
              Quick look at the user written -inlist2- shows that it is doing what Ken shows in #2. The Examples section shows how it can be used to the same end.

              Comment


              • #8
                Originally posted by Phuc Nguyen View Post

                Hi Ken Chui , is there any limitation regarding the element for each row in foreach statement then? I saw you document 7 countries per row before ///. I am not sure is it a coincidence or on purpose
                I was not able to ascertain how many elements a foreach loop can have, but I have used it to loop through states (about 50 elements) and global countries (>130 elements) and it worked fine.

                The "///" are simply put there to break the line. In Stata when you put a new line, it means it has concluded the command above. To make the command lines fit within 80 spaces (for better and more stable-looking code that does not rely on line wrapping), my habit is to use /// to break the line while keeping the command going on. In a way, this would also work:

                Code:
                gen include = 0
                foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA CHILE ECUADOR SINGAPORE NEWZEALAND{
                replace include = 1 if GEOGN == "`ctry'"
                }

                Comment


                • #9
                  I do not think that there are any limitations as to how many elements you can have in a loop.

                  But if you hold your elements in a macro, and you want to loop through the elements of the macro, then some limitation hits regarding the maximum size of a macro.

                  Here some examples:

                  Code:
                  . clear
                  
                  . set obs 1000000
                  number of observations (_N) was 0, now 1,000,000
                  
                  . gen n = _n
                  
                  . qui levelsof n, local(nlevs)
                  macro substitution results in line that is too long
                  r(920);
                  
                  . clear
                  
                  . set obs 100000
                  number of observations (_N) was 0, now 100,000
                  
                  . gen n = _n
                  
                  . qui levelsof n, local(nlevs)
                  So Stata allowed me to put the numbers from 1 to 100 000 in a macro, but complained that my macro is too big when I tried to put the numbers from 1 to 1 million.

                  Originally posted by Ken Chui View Post

                  I was not able to ascertain how many elements a foreach loop can have, but I have used it to loop through states (about 50 elements) and global countries (>130 elements) and it worked fine.

                  The "///" are simply put there to break the line. In Stata when you put a new line, it means it has concluded the command above. To make the command lines fit within 80 spaces (for better and more stable-looking code that does not rely on line wrapping), my habit is to use /// to break the line while keeping the command going on. In a way, this would also work:

                  Code:
                  gen include = 0
                  foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA CHILE ECUADOR SINGAPORE NEWZEALAND{
                  replace include = 1 if GEOGN == "`ctry'"
                  }

                  Comment


                  • #10
                    In case anyone is wondering, limitations are documented in

                    Code:
                    help limits

                    Comment


                    • #11
                      does anyone know if strings used in -inlist2- can incorporate wild cards like "* "or "?" ?

                      I am trying to use them but I cannot get them to work.

                      thanks

                      Comment


                      • #12
                        I have never used inlist2 but its scope seems clear from a glance at the help and the code. It tests for equality (only), so no matching, no regular expressions, no wildcards.

                        Personally, when there are too many values for inlist() to accept or for me to want to type, I back off, as I then consider that approach to be misguided. I then look for a merge solution. See https://www.stata.com/support/faqs/d...s-for-subsets/ for the spirit and some substance to this approach.

                        Comment


                        • #13
                          Originally posted by Vishal Sharma View Post
                          does anyone know if strings used in -inlist2- can incorporate wild cards like "* "or "?" ?

                          I am trying to use them but I cannot get them to work.

                          thanks
                          Hi Vishal, at the moment it does not. It's meant to be a quick solution for a basic case of inlist. I agree with Nick above that most cases in which inlist doesn't work, and I would add, where inlist2 is not an obvious solution, it's probably best to rethink one's approach to the problem.

                          Comment


                          • #14
                            Originally posted by Phuc Nguyen View Post
                            Hi all,

                            Yesterday, as documented from this post (https://www.statalist.org/forums/for...p-of-countries) , I try to find a way to test my code by using inlist for firms in more than 10 countries as below and it seems not correct

                            reghdfe y x if inlist(GEOGN, "CHINA" "UNITEDS" "INDONESIA" "RUSSIAN" "MEXICO" "JAPAN" "PHILIPPINES" "VIETNAM" "SOUTHKOREA") | inlist(GEOGN,"COLOMBIA" "CANADA" "PERU" "MALAYSIA" "AUSTRALIA" "CHILE" "ECUADOR" "SINGAPORE" "NEWZEALAND"), a(TYPE2 INDC32#yr)

                            I am search and found that we also have another user-written code named inlist2 without restriction regarding the number of countries included. Therefore, I install inlist2 and run the code but STATA returns an error

                            reghdfe y x if inlist2(GEOGN, CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA CHILE ECUADOR SINGAPORE NEWZEALAND), a(TYPE2 INDC32#yr)

                            unknown function inlist2()

                            r(133);


                            (I also test with the quotation marks for each country).

                            Could you please help me to identify what is the problem?

                            Thanks
                            As mentioned above, inlist2 is a user written program, not a Stata built in function. Always important to check the help file of a package before using in a code!!

                            Comment

                            Working...
                            X