Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove space between a number and a letter from a string variable

    Hello all, I hope everyone is doing well. I have a dataset with two addresses and my intension is to take out the space between 132 and F in the 1st record (in the example data with these 'problem' addresses below) and 1 and A in the 2nd record and so on.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str102(add1 add2)
    "132 F LARCHMONT ROAD"                        "132 F LARCHMONT ROAD"                      
    "FLAT 1 A 1 BRAMLEY ROAD"                     "FLAT 1 A 1 BRAMLEY ROAD"                    
    "FLAT 1 A 7 WOODLAND AVENUE"                  "FLAT 1 A 7 WOODLAND AVENUE"                
    "FLAT 1 A COPTHALL HOUSE GLOUCESTER CRESCENT" "FLAT 1 A COPTHALL HOUSE GLOUCESTER CRESCENT"
    "FLAT 1 A HOOD COURT NORTH STREET"            "FLAT 1 A HOOD COURT NORTH STREET"          
    end
    I am generating a flag to know which are these addresses by:
    Code:
     gen flag=1 if regexm(add1, "[0-9][ ][A-Z][ ]+")
    Ideally, I would like to replace the regex portions of add1 and add2 in the line above with a pattern of "[0-9][A-Z][ ]+", as in my case, I can't have a specific value to replace them with. The standard replace command (as expected) does not work here, and I suspect it has to be done with a loop. I have been trying to pinpoint the positions of the spaces that I don't want, to use that as the starting point, but my tries are fruitless till now. I apologise in advance if my search in statalist has not be exhaustive and someone has already asked this question elsewhere. If anyone could show me even a direction to solve this, it would be great! Thank you very much in advance.

    Best, Arpita
    Last edited by Arpita Ghosh; 08 Jul 2020, 15:26.

  • #2
    This seems to do what you want. The key is to use a regular expression substitution command to both match and replace.

    Code:
    . replace add1 = ustrregexra(add1, "([0-9]) ([A-Z]) +", "$1$2 ")
    (5 real changes made)
    
    . list, clean noobs
    
                                              add1                                          add2  
                               132F LARCHMONT ROAD                          132 F LARCHMONT ROAD  
                            FLAT 1A 1 BRAMLEY ROAD                       FLAT 1 A 1 BRAMLEY ROAD  
                         FLAT 1A 7 WOODLAND AVENUE                    FLAT 1 A 7 WOODLAND AVENUE  
        FLAT 1A COPTHALL HOUSE GLOUCESTER CRESCENT   FLAT 1 A COPTHALL HOUSE GLOUCESTER CRESCENT  
                   FLAT 1A HOOD COURT NORTH STREET              FLAT 1 A HOOD COURT NORTH STREET
    You will note that I use Stata's Unicode regular expression functions introduce a few releases ago. The real benefit of the Unicode regular expression functions is their much more powerful definition of regular expressions. To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's new regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp.

    Comment


    • #3
      Dear Dr. Lisowski, Thank you very much for your reply. This works wonderfully. I will definitely read the documentation you mention. Best, Arpita

      Comment

      Working...
      X