Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove prefix from macro

    I have a macro listing variables including factor variables -- e.g.,
    local covariates x i.y ib1.z
    I would like to remove the prefixes i., ib1. etc. I can remove the prefixes one at a time using the subinstr function, like this.
    local covariates x i.y ib1.z
    local covariates_clean = subinstr("`covariates'","i.","",1000)
    local covariates_clean = subinstr("`covariates_clean'","ib1.","",1000)
    di "`covariates_clean'"
    But if there are a lot of different prefixes, it is clunky to remove them one a time.
    The regexm() function can match all prefixes with wildcards, like this --
    local covariates x ib.y ib1.z i.zz
    local covariates_clean = regexr("`covariates'","i[^\.]*\.","")
    di "`covariates_clean'"
    --but it only replaces the first match.

    What do I need? Thanks!

  • #2
    If the local macro in question contains a variable list with factor variable notation, fvrevar with the list option might be solution.

    Best
    Daniel

    Comment


    • #3
      Thanks, but fvrevar creates new variables, right? I just want to remove the prefixes.

      Comment


      • #4
        Look at the ustrregexra() string function.

        Comment


        • #5
          Thanks to William's nice advice. I think the key part does lie in the regular expressions which I am not familiar with at all. I just quickly read the materials and get the following code and hope it can help.
          Code:
          local covariates x i.y ib1.z c123.ww pp.qq.dd
          local i=1
          foreach var of local covariates {
            local clean_`i'=ustrregexra("`var'","[^\.]*\.","")
            local `i++'
          }
          di "`clean_1'"
          di "`clean_2'"
          di "`clean_3'"
          di "`clean_4'"
          di "`clean_5'"
          local n=wordcount("`covariates'")
          
          forvalues i=1/`n'{
            local clean `clean' `clean_`i''
          }
          dis "`clean'"
          BTW, I still have two other questions. The first is that since you are using a local macro, why not create the macros that meet your purpose directly? The second is I firstly tried to use -tokenize- but stuck with the code on how to get the maximum part. namely,
          Code:
          local cov  c123.ww
          tokenize "`cov'", parse(".")
          
          dis "`1'"
          dis "`2'"
          dis "`3'"
          how can I get the maximum number 3 after -tokenize-
          thanks in advance if anyone is willing to answer my question.
          Last edited by Liu Qiang; 06 Jun 2019, 22:19.
          2B or not 2B, that's a question!

          Comment


          • #6
            Originally posted by paulvonhippel View Post
            Thanks, but fvrevar creates new variables, right?
            No; this is explicit in the help:

            list specifies that all factor-variable operators and time-series operators be removed from varlist and the resulting list of base variables be returned in r(varlist). No new variables are created with this option.

            Originally posted by Liu Qiang View Post
            why not create the macros that meet your purpose directly?
            Very good question. My guess is that the question really is about a variable list, stored in a macro. This is why I suggested using tools that are also used by StataCorp to deal with variable list. While regular expressions are nice tools, they are not well documented and it is easy to mess things up; that is my experience. Using fvrevar to parse the variable list seems to be the safer option; it is also less typing.


            Originally posted by Liu Qiang View Post
            I firstly tried to use -tokenize- but stuck with the code on how to get the maximum part.
            I do not think there is an easy way. macro list shows you this but I do not think that it leaves any results behind. You would probably just use a while loop but that is really a different topic.

            Best
            Daniel

            Comment


            • #7
              Thanks for the tips! Here's a shorter version:

              local covariates x i.y ib1.z ib(first).w
              local covariates_clean `covariates'
              local n = wordcount("`covariates'")
              forvalues i = 1/`n' {
              local covariates_clean = ustrregexra("`covariates_clean'","i[^\.]*\.","")
              }
              di "`covariates_clean'"
              I wonder if it could be done without the `n' macro.

              Comment


              • #8
                Originally posted by paulvonhippel View Post
                I wonder if it could be done without the `n' macro.
                Since you are not using the counter inside the loop you could

                Code:
                local covariates x i.y ib1.z ib(first).w
                local covariates_clean : copy local covariates
                foreach x of local covariates {
                    local covariates_clean = ustrregexra("`covariates_clean'","i[^\.]*\.","")
                }
                di "`covariates_clean'"
                However, William was most likely thinking in the direction of

                Code:
                local covariates x i.y ib1.z ib(first).w
                local covariates_clean = ustrregexra("`covariates'", "i[^\.]*\.", "", .)
                di "`covariates_clean'"
                where no loop is needed.

                Here is why I do not like the regular expression approach:

                Code:
                sysuse auto , clear
                local covariates i.foreign ib(#1).rep78 c.mpg
                local covariates_clean : copy local covariates
                foreach x of local covariates {
                    local covariates_clean = ustrregexra("`covariates_clean'","i[^\.]*\.","")
                }
                di "`covariates_clean'"
                which produces

                Code:
                . local covariates i.foreign ib(#1).rep78 c.mpg
                
                output omitted
                
                . di "`covariates_clean'"
                forerep78 c.mpg
                Compare the above with

                Code:
                . local covariates i.foreign ib(#1).rep78 c.mpg
                
                . fvrevar `covariates' , list
                
                . return list
                
                macros:
                            r(varlist) : "mpg foreign rep78"
                The same problem arises with interaction specifications, such as

                Code:
                i.foreign##i.rep78
                You could fiddle around more and get the regular expressions right, but I would just use fvrevar if this is about manipulating variable lists.

                Best
                Daniel

                Comment

                Working...
                X