Remove space between a number and a letter from a string variable

Arpita Ghosh

Join Date: Feb 2019

Posts: 7
#1

Remove space between a number and a letter from a string variable

08 Jul 2020, 15:23

Hello all, I hope everyone is doing well. I have a dataset with two addresses and my intension is to take out the space between 132 and F in the 1st record (in the example data with these 'problem' addresses below) and 1 and A in the 2nd record and so on.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str102(add1 add2) "132 F LARCHMONT ROAD" "132 F LARCHMONT ROAD" "FLAT 1 A 1 BRAMLEY ROAD" "FLAT 1 A 1 BRAMLEY ROAD" "FLAT 1 A 7 WOODLAND AVENUE" "FLAT 1 A 7 WOODLAND AVENUE" "FLAT 1 A COPTHALL HOUSE GLOUCESTER CRESCENT" "FLAT 1 A COPTHALL HOUSE GLOUCESTER CRESCENT" "FLAT 1 A HOOD COURT NORTH STREET" "FLAT 1 A HOOD COURT NORTH STREET" end

I am generating a flag to know which are these addresses by:

Code:

gen flag=1 if regexm(add1, "[0-9][ ][A-Z][ ]+")

Ideally, I would like to replace the regex portions of add1 and add2 in the line above with a pattern of "[0-9][A-Z][ ]+", as in my case, I can't have a specific value to replace them with. The standard replace command (as expected) does not work here, and I suspect it has to be done with a loop. I have been trying to pinpoint the positions of the spaces that I don't want, to use that as the starting point, but my tries are fruitless till now. I apologise in advance if my search in statalist has not be exhaustive and someone has already asked this question elsewhere. If anyone could show me even a direction to solve this, it would be great! Thank you very much in advance.

Best, Arpita

Last edited by Arpita Ghosh; 08 Jul 2020, 15:26.
Tags: string

William Lisowski

Join Date: Dec 2014
Posts: 10150

08 Jul 2020, 17:30

This seems to do what you want. The key is to use a regular expression substitution command to both match and replace.

Code:

. replace add1 = ustrregexra(add1, "([0-9]) ([A-Z]) +", "$1$2 ")
(5 real changes made)

. list, clean noobs

                                          add1                                          add2  
                           132F LARCHMONT ROAD                          132 F LARCHMONT ROAD  
                        FLAT 1A 1 BRAMLEY ROAD                       FLAT 1 A 1 BRAMLEY ROAD  
                     FLAT 1A 7 WOODLAND AVENUE                    FLAT 1 A 7 WOODLAND AVENUE  
    FLAT 1A COPTHALL HOUSE GLOUCESTER CRESCENT   FLAT 1 A COPTHALL HOUSE GLOUCESTER CRESCENT  
               FLAT 1A HOOD COURT NORTH STREET              FLAT 1 A HOOD COURT NORTH STREET

You will note that I use Stata's Unicode regular expression functions introduce a few releases ago. The real benefit of the Unicode regular expression functions is their much more powerful definition of regular expressions. To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's new regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp.

Comment

Arpita Ghosh

Join Date: Feb 2019

Posts: 7
#3

09 Jul 2020, 01:51

Dear Dr. Lisowski, Thank you very much for your reply. This works wonderfully. I will definitely read the documentation you mention. Best, Arpita
Comment

Announcement