Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting parts of a string for -rename-

    Hi all,

    I'm running into a problem I don't understand. I have a bunch of variables named of the form [lower case letter][one or two digit number][capital letter][# of lower case letters], and i'm trying to extract everything before the capital letter and rename the variable. I've tried using regexs and regexm to loop over the variable names and rename them but the function always defaults to the last variable in the local varlist.

    The function works well on it's own though. E.g.

    Code:
    clear all
    set obs 15
    gen id = _n
    gen test = "b1Kasjdad"
    replace test = "b12Flkskdj" if _n>3
    replace test = "c14Hdasliv" if _n>6
    replace test = "d9Lpoawfe" if _n>9
    replace test = "h99Mnqwdi" if _n>12
    gen test2 = regexs(0) if regexm(test, "[a-z][0-9]*")
    generates this dataset

    Code:
    clear
    input float id str10 test str3 test2
     1 "b1Kasjdad"  "b1" 
     2 "b1Kasjdad"  "b1" 
     3 "b1Kasjdad"  "b1" 
     4 "b12Flkskdj" "b12"
     5 "b12Flkskdj" "b12"
     6 "b12Flkskdj" "b12"
     7 "c14Hdasliv" "c14"
     8 "c14Hdasliv" "c14"
     9 "c14Hdasliv" "c14"
    10 "d9Lpoawfe"  "d9" 
    11 "d9Lpoawfe"  "d9" 
    12 "d9Lpoawfe"  "d9" 
    13 "h99Mnqwdi"  "h99"
    14 "h99Mnqwdi"  "h99"
    15 "h99Mnqwdi"  "h99"
    end
    The variable test2 is exactly how I want to name my variable names. My dataset is currently more in the format like this

    Code:
    clear
    input float(id b1Kasjdad b12Flkskdj c14Hdasliv d9Lpoawfe h99Mnqwdi)
     1 .36276805   .6640378  .2363827 .35137045    .431756
     2 .31382445 .036803816  .6764786  .6978495   .7970607
     3  .6380712   .3336498 .14591032  .7828125   .5824112
     4 .18699375   .7824295 .29664016  .8312564   .1161198
     5 .50534767  .01700492  .8219558  .8498105    .742251
     6  .5276305   .2278204  .3213928  .9997507  .26177752
     7  .7853414  .57824653  .4164997 .09139451   .8069862
     8  .4717338   .7533595 .02369639  .6306959    .667398
     9  .2299842   .8570072  .3125404 .07612886   .2427153
    10  .7976828   .9322746  .9322619  .3457762    .629151
    11 .16493952    .324447  .0502046 .09631826  .53290236
    12   .932945   .1637711  .6221892 .54500425   .7275901
    13  .3999315    .958201  .6189114  .8078102   .6358732
    14  .9881987   .6008608  .9028944  .1186969  .30897725
    15  .9287856   .9733476  .3830579  .2723139 .031594325
    end
    If I then run

    Code:
    foreach var of varlist b1Kasjdad - h99Mnqwdi{
        rename `var' `=regexs(0) if regexm("`var'", "[a-z][0-9]*")'
    }
    It names the first variable in varlist h99 and then tries to do the same for the second variable in varlist but, naturally, returns an error because that variable name is already in use.

    Any thoughts on what i'm doing wrong here?

    Thanks!

  • #2
    I can see at least two problems here.

    1. The if qualifier isn't allowed anywhere here, either within rename, or even within the syntax delimited by `= ' -- as the point of the if qualifier is to specify some or possibly all observations that are relevant. Stata names don't apply conditionally to some observations and not others.

    2. The
    `: ' syntax just isn't that capacious. You might get away with an application of cond(), but on the whole something simpler is greatly preferable.

    This works with your example:


    Code:
    foreach var of varlist b1Kasjdad-h99Mnqwdi {
        local isnum = regexm("`var'", "[a-z][0-9]*") 
        if `isnum' rename `var' `=regexs(0)'
    }

    Comment


    • #3
      Code:
      foreach v of varlist b1Kasjdad - h99Mnqwdi {
      
          rename `v' `= regexr("`v'", "[A-Z].*$", "") '
      }

      Comment


      • #4
        Nick, as always thank you for your thoughtful response. Upon reflection, the use of the if qualifier with -rename- is indeed a strange thing to attempt... I'm not sure I follow your second point, but either way the code works as intended and I think I follow what it's doing.

        Bjarte, thank you for the alternative option. It works perfectly for my case.

        Chris

        Comment


        • #5
          My second point overlaps with my first. There isn't room inside `: ' for an if qualifier (or an if command for that matter). An extended macro function must evaluate to a single string.

          Comment


          • #6
            Perhaps I am missing something but

            Code:
            rename ?#* ?#
            should do what is asked for here.

            Best
            Daniel

            Comment


            • #7
              Ah I see. Thanks Nick, yes makes sense.

              Daniel, I don't think you are... that seems to have done exactly the same thing as the code shared by Nick and Bjarte. The only complication is that there in fact a few variables that break the rule I shared before of the form di8Uawjdsd, fe2Kawdlknsa, etc., with two letters preceding the first number.

              Chris

              Comment


              • #8
                Originally posted by Chris Larkin View Post
                The only complication is that there in fact a few variables that break the rule I shared before of the form di8Uawjdsd, fe2Kawdlknsa, etc., with two letters preceding the first number.
                Although it is probably obvious by now,

                Code:
                rename *#* *#
                will handle this situation, too.

                Best
                Daniel

                Comment


                • #9

                  A comment to #2

                  You might get away with an application of cond()
                  cond() with regexm() and regexs() in the same expression will not work as expected because evaluation of regexs(n) returns the subexpression n from a previous (expression) regexm() match. This behavior is consistent for generate, display and inline use of Stata’s expression evaluator:

                  A) The following works as expected:
                  Code:
                  foreach var in b1Kasjdad b12Flkskdj c14Hdasliv d9Lpoawfe h99Mnqwdi {
                  
                      if ( regexm("`var'", "[a-z][0-9]*") ) {
                      
                          di "`var'" _col(15) "`=regexs(0)'"
                      }
                  }
                  
                  clear all // will not clear regexs(n)
                  Code:
                  b1Kasjdad     b1
                  b12Flkskdj    b12
                  c14Hdasliv    c14
                  d9Lpoawfe     d9
                  h99Mnqwdi     h99
                  B) The following does not work as expected:
                  Code:
                  foreach var in b1Kasjdad b12Flkskdj c14Hdasliv d9Lpoawfe h99Mnqwdi {
                  
                      display "`var'" _col(15) cond( regexm("`var'", "[a-z][0-9]*"), "`=regexs(0)'", "`var'" ) 
                  }
                  
                  clear all // will not clear regexs()
                  Code:
                  b1Kasjdad     h99   <- left from last regexm() in A) above
                  b12Flkskdj    b1          
                  c14Hdasliv    b12
                  d9Lpoawfe     c14
                  h99Mnqwdi     d9   "h99" will be left after last regexm()
                  C) Example using generate:
                  Code:
                  clear all // will not clear regexs(n) 
                  set obs 1
                  
                  foreach var in b1Kasjdad b12Flkskdj c14Hdasliv d9Lpoawfe h99Mnqwdi {
                  
                      gen `var' =  cond( regexm("`var'", "[a-z][0-9]*"), "`=regexs(0)'", "`var'" )  
                  }
                  
                  format %-3s *
                  list ,abbrev(12)
                  Code:
                       +-------------------------------------------------------------+
                       | b1Kasjdad   b12Flkskdj   c14Hdasliv   d9Lpoawfe   h99Mnqwdi |
                       |-------------------------------------------------------------|
                    1. | h99         b1           b12          c14         d9        |
                       +-------------------------------------------------------------+
                  Code:
                  clear all // will not clear regexs(n) 
                  set obs 1
                  
                  foreach var in b1Kasjdad b12Flkskdj c14Hdasliv d9Lpoawfe h99Mnqwdi {
                  
                      gen `var' =  regexs(0) if regexm("`var'", "[a-z][0-9]*")  
                  }
                  
                  format %-3s *
                  list ,abbrev(12)
                  Code:
                       +-------------------------------------------------------------+
                       | b1Kasjdad   b12Flkskdj   c14Hdasliv   d9Lpoawfe   h99Mnqwdi |
                       |-------------------------------------------------------------|
                    1. | b1          b12          c14          d9          h99       |
                       +-------------------------------------------------------------+

                  Is there any reason -clear all- should not clear regexs(n) ?
                  Last edited by Bjarte Aagnes; 25 Apr 2019, 08:21.

                  Comment


                  • #10
                    I correct my previous post; you get away with an application of cond():
                    Code:
                    foreach var in b1Kasjdad b12Flkskdj c14Hdasliv d9Lpoawfe h99Mnqwdi {
                     
                        display "`var'" _col(15) cond(`=regexm("`var'", "[a-z][0-9]*")', "`=regexs(0)'", "`var'" ) 
                     }
                    
                    b1Kasjdad     b1
                    b12Flkskdj    b12
                    c14Hdasliv    c14
                    d9Lpoawfe     d9
                    h99Mnqwdi     h99
                    
                    clear all // will not clear regexs()

                    Comment


                    • #11
                      Bjarte Aagnes Thanks for following this through and thoroughly.

                      Comment


                      • #12

                        Thanks Nick,

                        Finally, in my post #10 the argument "`=regexs(0)'" can be simplyfied to regexs(0)

                        Applying this to the example data (adding variable labels):
                        Code:
                        foreach v of varlist b1Kasjdad - h99Mnqwdi {
                            
                            rename `v' `= cond( `= regexm("`v'", "[a-z][0-9]*") ' , regexs(0) , "`v'" ) '
                            
                            lab var `= cond( `= regexm("`v'", "[a-z][0-9]*") ' , regexs(0) , "`v'" ) ' "`v'"
                        }
                        Code:
                        describe
                        
                        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                      storage   display    value
                        variable name   type    format     label      variable label
                        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        id              float   %9.0g                 
                        b1              float   %9.0g                 b1Kasjdad
                        b12             float   %9.0g                 b12Flkskdj
                        c14             float   %9.0g                 c14Hdasliv
                        d9              float   %9.0g                 d9Lpoawfe
                        h99             float   %9.0g                 h99Mnqwdi
                        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        Sorted by: 
                             Note: Dataset has changed since last saved.

                        Comment

                        Working...
                        X