How to remove the invisible space in the tail of a string in Stata?

smith Jason

Join Date: Sep 2020

Posts: 378
#1

How to remove the invisible space in the tail of a string in Stata?

17 May 2022, 14:30

Hello, I have a small dataset below,
clear
input str80 No str90 Project
"F-000508" "Hermann Park "
"F-000509" "Environmental Projects"
end

I want to remove the invisible space in the tail of "Park ". with Stata.
Thank you for your help!
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

17 May 2022, 14:34

See

Code:

help strtrim() help ustrtrim()

Others who find this, also see this thread for a similar problem. More generally, see

Code:

help string functions
1 like
Comment
smith Jason

Join Date: Sep 2020

Posts: 378
#3

17 May 2022, 15:37

Thank you! But your suggestion is too vague, I still don't know what to do and how to do it.
Can someone help me?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

18 May 2022, 02:05

There is nothing vague about #3. It's utterly specific. daniel klein pointed to a solution by way of pointing to resources. His answer was excellent, and It's the right answer:

Code:

clear input str80 No str90 Project "F-000508" "Hermann Park " "F-000509" "Environmental Projects" end replace Project = strtrim(Project) . list +-----------------------------------+ | No Project | |-----------------------------------| 1. | F-000508 Hermann Park | 2. | F-000509 Environmental Projects | +-----------------------------------+

Styles of answering vary here. Some answers are long, some short. Some give direct code; some add lengthy explanations or discussions; some hint at solutions or give references. I am not consistent myself, depending on how busy I am, the phase of the moon, and other unsurprising variables.

Good answers here answer the question, In many ways, better answers are those that help you answer your own question but they are written expecting that you follow leads and so grow in your own understanding of Stata.
2 likes
Comment
smith Jason

Join Date: Sep 2020

Posts: 378
#5

18 May 2022, 10:45

I checked the results and found the command above still didn't work. Because the trailing blank is still there. I really don't know what happened.
Thank you, anyway!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

18 May 2022, 10:51

There are other characters that look like spaces but aren't. One is uchar(160)

Code:

. di "|" uchar(160) "|" | |

To remove it, use subinstr() as in

Code:

replace Project = subinstr(Project, uchar(160), "", .)

but watch out that your interior spaces are not thereby removed. See also chartab from SSC.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#7

18 May 2022, 10:57

Note that in #2, I have already pointed to

Code:

help ustrtrim()

which will handle much more than white space (i.e., ASCII char 32).
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

18 May 2022, 11:12

#8 daniel klein Correct again, I stopped on seeing that strtrim() coped with the problem as it appears in #1.
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#9

18 May 2022, 11:13

Regex may also help here:

Code:

gen wanted = ustrregexra(Project,"\s*?$","")
1 like
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

#10

18 May 2022, 15:56

Code:

scalar s = ///
char(32) + char(9) + char(10) + char(11) + char(12) + char(13) + /// 
ustrunescape("\u00A0") + ustrunescape("\u1680") + ustrunescape("\u180E") + ///
ustrunescape("\u2000") + ustrunescape("\u2001") + ustrunescape("\u2002") + ///
ustrunescape("\u2003") + ustrunescape("\u2004") + ustrunescape("\u2005") + ///
ustrunescape("\u2006") + ustrunescape("\u2007") + ustrunescape("\u2008") + ///
ustrunescape("\u2009") + ustrunescape("\u200A") + ustrunescape("\u200B") + ///
ustrunescape("\u202F") + ustrunescape("\u205F") + ustrunescape("\u3000") + ///
ustrunescape("\uFEFF")

assert ustrrtrim(s) == ustrregexrf(s,"\s+$","")

Code:

* some info from chartab:

   decimal  hexadecimal     unique name
-----------------------------------------------------
         9       \u0009     HORIZONTAL TABULATION
        10       \u000a     NEW LINE
        11       \u000b     VERTICAL TABULATION
        12       \u000c     FORM FEED
        13       \u000d     CARRIAGE RETURN
        32       \u0020     SPACE
       160       \u00a0     NO-BREAK SPACE
     5,760       \u1680     OGHAM SPACE MARK
     6,158       \u180e     MONGOLIAN VOWEL SEPARATOR
     8,192       \u2000     EN QUAD
     8,193       \u2001     EM QUAD
     8,194       \u2002     EN SPACE
     8,195       \u2003     EM SPACE
     8,196       \u2004     THREE-PER-EM SPACE
     8,197       \u2005     FOUR-PER-EM SPACE
     8,198       \u2006     SIX-PER-EM SPACE
     8,199       \u2007     FIGURE SPACE
     8,200       \u2008     PUNCTUATION SPACE
     8,201       \u2009     THIN SPACE
     8,202       \u200a     HAIR SPACE
     8,203       \u200b     ZERO WIDTH SPACE
     8,239       \u202f     NARROW NO-BREAK SPACE
     8,287       \u205f     MEDIUM MATHEMATICAL SPACE
    12,288       \u3000     IDEOGRAPHIC SPACE
    65,279       \ufeff     ZERO WIDTH NO-BREAK SPACE
------------------------------------+-----------------

Last edited by Bjarte Aagnes; 18 May 2022, 16:06.

Comment

smith Jason

Join Date: Sep 2020

Posts: 378
#11

23 May 2022, 10:31

Originally posted by Nick Cox View Post

There are other characters that look like spaces but aren't. One is uchar(160)

Code:

. di "|" uchar(160) "|" | |

To remove it, use subinstr() as in

Code:

replace Project = subinstr(Project, uchar(160), "", .)

but watch out that your interior spaces are not thereby removed. See also chartab from SSC.

It still doesn't work for my real dataset.
Anyway, Thank you!
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 783
#12

23 May 2022, 10:59

try the following:

Code:

gen project2 = ustrregexrf(project,"[\s\p{Cf}]+$","")

and install and use chartab from SSC
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#13

23 May 2022, 11:37

Originally posted by smith Jason View Post

It still doesn't work for my real dataset.
Anyway, Thank you!

Unfortunately, even though multiple useful solutions and hints have been presented here, repeatedly saying that “it doesn’t work” isn’t really diagnostic of the true problem. The dataex didn’t faithfully represent the problem, evidently. If you would like more specific help then us worthwhile to post back with some specific information about your actual dataset. Perhaps download and run -chartab- from the SSC, as suggested in #12, against your text variable and report those results here.
Comment
smith Jason

Join Date: Sep 2020

Posts: 378
#14

23 May 2022, 12:04

Thanks for your responses and help. However, the data is not allowed to share to the public.
I really want to get things done. That is the fact.
Comment
smith Jason

Join Date: Sep 2020

Posts: 378
#15

23 May 2022, 12:07

Finally, I used the following code to solve this problem,
replace Project = "Hermann Park" if strpos(Project,"Hermann")
Because there are many observations in the dataset like this, that is why I don't want to use this method to address these issues time after time.
Thank you all!
Comment

Announcement

How to remove the invisible space in the tail of a string in Stata?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment