Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to remove the invisible space in the tail of a string in Stata?

    Hello, I have a small dataset below,
    clear
    input str80 No str90 Project
    "F-000508" "Hermann Park "
    "F-000509" "Environmental Projects"
    end

    I want to remove the invisible space in the tail of "Park ". with Stata.
    Thank you for your help!

  • #2
    See

    Code:
    help strtrim()
    help ustrtrim()
    Others who find this, also see this thread for a similar problem. More generally, see

    Code:
    help string functions

    Comment


    • #3
      Thank you! But your suggestion is too vague, I still don't know what to do and how to do it.
      Can someone help me?

      Comment


      • #4
        There is nothing vague about #3. It's utterly specific. daniel klein pointed to a solution by way of pointing to resources. His answer was excellent, and It's the right answer:

        Code:
        clear
        input str80 No str90 Project
        "F-000508" "Hermann Park "
        "F-000509" "Environmental Projects"
        end
        
        replace Project = strtrim(Project) 
        
        
        . list 
        
             +-----------------------------------+
             |       No                  Project |
             |-----------------------------------|
          1. | F-000508             Hermann Park |
          2. | F-000509   Environmental Projects |
             +-----------------------------------+
        Styles of answering vary here. Some answers are long, some short. Some give direct code; some add lengthy explanations or discussions; some hint at solutions or give references. I am not consistent myself, depending on how busy I am, the phase of the moon, and other unsurprising variables.

        Good answers here answer the question, In many ways, better answers are those that help you answer your own question but they are written expecting that you follow leads and so grow in your own understanding of Stata.

        Comment


        • #5
          I checked the results and found the command above still didn't work. Because the trailing blank is still there. I really don't know what happened.
          Thank you, anyway!

          Comment


          • #6
            There are other characters that look like spaces but aren't. One is uchar(160)

            Code:
            . di "|" uchar(160) "|"
            | |

            To remove it, use
            subinstr() as in

            Code:
            replace Project = subinstr(Project, uchar(160), "", .)
            but watch out that your interior spaces are not thereby removed. See also
            chartab from SSC.

            Comment


            • #7
              Note that in #2, I have already pointed to

              Code:
              help ustrtrim()
              which will handle much more than white space (i.e., ASCII char 32).

              Comment


              • #8
                #8 daniel klein Correct again, I stopped on seeing that strtrim() coped with the problem as it appears in #1.

                Comment


                • #9
                  Regex may also help here:

                  Code:
                  gen wanted = ustrregexra(Project,"\s*?$","")

                  Comment


                  • #10
                    Code:
                    scalar s = ///
                    char(32) + char(9) + char(10) + char(11) + char(12) + char(13) + /// 
                    ustrunescape("\u00A0") + ustrunescape("\u1680") + ustrunescape("\u180E") + ///
                    ustrunescape("\u2000") + ustrunescape("\u2001") + ustrunescape("\u2002") + ///
                    ustrunescape("\u2003") + ustrunescape("\u2004") + ustrunescape("\u2005") + ///
                    ustrunescape("\u2006") + ustrunescape("\u2007") + ustrunescape("\u2008") + ///
                    ustrunescape("\u2009") + ustrunescape("\u200A") + ustrunescape("\u200B") + ///
                    ustrunescape("\u202F") + ustrunescape("\u205F") + ustrunescape("\u3000") + ///
                    ustrunescape("\uFEFF")
                    
                    assert ustrrtrim(s) == ustrregexrf(s,"\s+$","")
                    Code:
                    * some info from chartab:
                    
                       decimal  hexadecimal     unique name
                    -----------------------------------------------------
                             9       \u0009     HORIZONTAL TABULATION
                            10       \u000a     NEW LINE
                            11       \u000b     VERTICAL TABULATION
                            12       \u000c     FORM FEED
                            13       \u000d     CARRIAGE RETURN
                            32       \u0020     SPACE
                           160       \u00a0     NO-BREAK SPACE
                         5,760       \u1680     OGHAM SPACE MARK
                         6,158       \u180e     MONGOLIAN VOWEL SEPARATOR
                         8,192       \u2000     EN QUAD
                         8,193       \u2001     EM QUAD
                         8,194       \u2002     EN SPACE
                         8,195       \u2003     EM SPACE
                         8,196       \u2004     THREE-PER-EM SPACE
                         8,197       \u2005     FOUR-PER-EM SPACE
                         8,198       \u2006     SIX-PER-EM SPACE
                         8,199       \u2007     FIGURE SPACE
                         8,200       \u2008     PUNCTUATION SPACE
                         8,201       \u2009     THIN SPACE
                         8,202       \u200a     HAIR SPACE
                         8,203       \u200b     ZERO WIDTH SPACE
                         8,239       \u202f     NARROW NO-BREAK SPACE
                         8,287       \u205f     MEDIUM MATHEMATICAL SPACE
                        12,288       \u3000     IDEOGRAPHIC SPACE
                        65,279       \ufeff     ZERO WIDTH NO-BREAK SPACE
                    ------------------------------------+-----------------
                    Last edited by Bjarte Aagnes; 18 May 2022, 16:06.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      There are other characters that look like spaces but aren't. One is uchar(160)

                      Code:
                      . di "|" uchar(160) "|"
                      | |

                      To remove it, use
                      subinstr() as in

                      Code:
                      replace Project = subinstr(Project, uchar(160), "", .)
                      but watch out that your interior spaces are not thereby removed. See also
                      chartab from SSC.
                      It still doesn't work for my real dataset.
                      Anyway, Thank you!

                      Comment


                      • #12
                        try the following:
                        Code:
                        gen project2 =  ustrregexrf(project,"[\s\p{Cf}]+$","")
                        and install and use chartab from SSC

                        Comment


                        • #13
                          Originally posted by smith Jason View Post

                          It still doesn't work for my real dataset.
                          Anyway, Thank you!
                          Unfortunately, even though multiple useful solutions and hints have been presented here, repeatedly saying that “it doesn’t work” isn’t really diagnostic of the true problem. The dataex didn’t faithfully represent the problem, evidently. If you would like more specific help then us worthwhile to post back with some specific information about your actual dataset. Perhaps download and run -chartab- from the SSC, as suggested in #12, against your text variable and report those results here.

                          Comment


                          • #14
                            Thanks for your responses and help. However, the data is not allowed to share to the public.
                            I really want to get things done. That is the fact.

                            Comment


                            • #15
                              Finally, I used the following code to solve this problem,
                              replace Project = "Hermann Park" if strpos(Project,"Hermann")
                              Because there are many observations in the dataset like this, that is why I don't want to use this method to address these issues time after time.
                              Thank you all!

                              Comment

                              Working...
                              X