Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Circumventing illegal wildcard * in if statements with prefix of variable values

    Hi all,

    My data:

    Code:
    input long ONETSOCCODE str6 NaicsCode
    57-0000 "238210"
    78-0004 "611519"
    54-0475 "238160"
    45-6598 "238350"
    32-1000 "237130"
    87-0495 "236118"
    46-7600 "922140"
    54-8905 "238220"
    I would like to run the following code:
    Code:
    g treated=0
    replace treated=1 if inlist(ONETSOCCCODE,17-*,49-*,51-*,15-11*,13-*,27-1*,27-3*) | inlist(NaicsCode,"31*","32*","33*","54*","51*")
    However I get the error:

    3* invalid name.

    In the following post: https://stackoverflow.com/questions/...n-if-statement, the following was written by Nick Cox: "To answer your underlying question, wildcards are not allowed in if qualifiers (the case here) or if statements."

    Therefore, I wondered whether there exists a way or circumventing the fact that wildcards are forbidden in if statements, by telling Stata to
    Code:
    replace treated=1
    for all combinations of ONETSOCCCODE and NaicsCode that start with the specified prefixes.

    Any ideas?

    Many thanks in advance!
    Maxence Morlet

  • #2
    Your code won't run as a data example, as 57-0000 and all the other values for the same variable just aren't integers and can't be stored in a long. So, I have to guess that you didn't use dataex as requested and that really 57-000 and the like are either value labels or string values.

    There will be a solution to this, but it depends on what the data are. So please tell us.

    Comment


    • #3
      Apologies for this, I have indeed used dataex, however reported the wrong variable following a confusion caused by the use of the encode command.

      ONETSOCCODE was generated as an encoding of a strong variable, OnetSocCode.

      For simplicity and accuracy, I will now report the orginal string variable, OnetSocCode:

      Code:
      input str10 OnetSocCode str6 NaicsCode
      "17-3024.00" "334417"
      "17-3024.00" "334417"
      "47-2073.00" "237310"
      "17-3024.00" "334417"
      "47-2111.00" "236210"
      "51-4111.00" ""      
      "47-2111.00" "236210"
      "47-2073.00" "237310"
      "47-2111.00" "236210"
      "47-2211.00" "813930"
      "17-3024.00" "334417"
      "47-2073.00" "237310"
      "47-2073.00" "237310"
      "17-3024.00" "334417"
      "47-2111.00" "236210"
      "47-2211.00" "813930"
      "47-2073.00" "237310"
      "17-3024.00" "334417"
      "47-2073.00" "237310"
      "17-3024.00" "334417"
      "47-2152.00" "236210"
      "47-2073.00" "237310"
      "47-2073.00" "237310"
      "17-3024.00" "334417"
      "47-2073.00" "237310"
      "47-2211.00" "922140"
      "47-2111.00" "238210"
      "47-2111.00" "238210"
      "47-2211.00" "922140"
      "47-2152.00" "922160"
      "47-2111.00" "236210"
      "47-2211.00" "922140"
      "47-2111.00" "238210"
      "47-2152.00" "922160"
      The code I've mentioned above therefore is now:

      Code:
       g treated=0
      
      replace treated=1 if inlist(OnetSocCode,"17-*","49-*","51-*","15-11*","13-*","27-1*","27-3*") | inlist(NaicsCode,"31*","32*","33*","54*","51*")
      Many thanks in advance for your help.
      Last edited by Maxence Morlet; 20 Mar 2022, 11:56.

      Comment


      • #4
        I believe this hybrid approach does what you want.
        Code:
        generate treated=0 
        replace treated=1 if ustrregexm(OnetSocCode,"17-|49-|51-|15-11|13-|27-1|27-3")
        replace treated=1 if inlist(substr(NaicsCode,1,2),"31","32","33","54","51")

        Comment


        • #5
          The problem with this version of the code is that * within quoted substrings is just going to be treated as a literal character.

          Code:
          . di inlist("*", "frog", "toad")
          0
          
          . di inlist("*", "*", "frog", "toad")
          1
          You're mixing quite different syntaxes for different kinds of string functions.

          Functions like strpos() substr() subinstr() are highly (totally, I think) literal.

          Functions like strmatch() and the regex functions work with some kind of pattern language, in which some characters like the asterisk have special meaning.

          This doesn't cover all your cases but it points in a good direction.

          Code:
          gen treated = inlist(substr(netSocCode, 1 , 3),"17-","49-","51-","13-") | inlist(substr(NaicsCode, 1, 2),"31","32","33","54","51")
          I've omitted looking for

          Code:
           
           "15-11", "27-1","27-3"
          which could be found with different calls to inlist(substr())

          Comment


          • #6
            Thank you both for your responses, and thank you Nick for your detailed explanation. I will also read up on the "strmatch()" function.

            Comment

            Working...
            X