Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace space(s) in a string by another charater like and underscore (_)?

    For my coding, I need to automatically replace possible spaces from value labels during processing in my do file.

    Some inspiration I got from this blog post entry #3 by Nick Cox:
    Code:
    . di stritrim("South           Africa")
    South Africa
    So, maybe I am ignorant, but, I assumed that this would give the result that I am looking for:
    Code:
    . di subinstr("South Africa", " ", "_")
    invalid syntax
    r(198);
    But, this clearly is not correct.

    So, my question is: what code would replace space(s) in a string by another character like an underscore (_)?
    http://publicationslist.org/eric.melse

  • #2
    Four arguments are needed for that function, not three.

    Code:
      subinstr(s1,s2,s3,n)
           Description:  s1, where the first n occurrences in s1 of s2 have been replaced with s3
            
                         subinstr() is intended for use with only plain ASCII characters and for use by programmers who want to perform byte-based substitution.  Note that any Unicode
                         character beyond ASCII range (code point greater than 127) takes more than 1 byte in the UTF-8 encoding; for example, é takes 2 bytes.
    
                         To perform character-based replacement in Unicode strings, see usubinstr().
    
                         If n is missing, all occurrences are replaced.
    
                         Also see regexm(), regexr(), and regexs().
    
                         subinstr("this is the day","is","X",1) = "thX is the day"
                         subinstr("this is the hour","is","X",2) = "thX X the hour"
                         subinstr("this is this","is","X",.) = "thX X thX"
           Domain s1:    strings (to be substituted into)
           Domain s2:    strings (to be substituted from)
           Domain s3:    strings (to be substituted with)
           Domain n:     integers > 0 or missing
           Range:        strings

    Comment


    • #3
      Dear Nick,

      Thanks for pointing this out to me. Indeed, these examples work just fine:
      Code:
      di subinstr("South Africa", " ", "_", 1)
      and
      Code:
      di subinstr("South Africa Now", " ", "_", 2)
      For the above two examples it has to be know how many spaces are expected in the string beforehand.

      But, I needed a more general solution where it is possible to remove any space present in the string to be replaced by an underscore character.
      The code below is my solution for that purpose, which I include here to complete and close this post:
      Code:
      * An example string where spaces have to be replaced
      local label "South Africa The place to go"
      local test = ustrregexm("`label'", " ")
      while `test' == 1 {
          local label = subinstr("`label'", " ", "_", 1)
          local test = ustrregexm("`label'", " ")
      }
      dis "`label'"
      * Done!
      http://publicationslist.org/eric.melse

      Comment


      • #4
        Note in Nick's post #2

        If n is missing, all occurrences are replaced.
        Therefore, you simply need

        Code:
        local label = subinstr("`label'", " ", "_", .)

        Comment


        • #5
          Dear Andrew,

          Thank you for your helpful explanation.

          Indeed, I was not aware of this use of the 'dot' in such function coding.
          It is interesting to learn that the 'dot' is used here to code for 'all occurences' whereas in regular do-syntax it codes for 'null occurances' or none.
          http://publicationslist.org/eric.melse

          Comment


          • #6
            In computing it is hard not to overload syntax unless you keep inventing many new symbols or names that will be hard to remember.

            The leading meaning for isolated dots or periods in Stata is system missing for numeric values, but in graphics options such as xscale(r(. 100)) its sense is "whatever is needed here" and the same meaning applies to subinstr().

            Perhaps a way of tying the ideas together is "not otherwise specified".

            Comment


            • #7
              I agree, and I am very happy with Stata's conciseness and flexibility of syntax coding. Sometimes, when working with something new (for me) such puzzles are rather difficult to solve. But, how good is it to share issues and ideas on the Statalist so we all can learn from it!
              http://publicationslist.org/eric.melse

              Comment


              • #8
                Following this thread I have generated a date variable and then converted it to a string. I am looking to replace the the "/" in the date to an "_" however when doing so my date is no longer visible. Below is the code.

                gen date_2 = daily("`c(current_date)'","DMY") - 1
                format date_2 %tdnn/dd/YY
                gen date_3 = string(date_2, "%tdnn/dd/YY")
                replace date_3 = subinstr("`date_3'", "/", "_", .)

                Thank you for your help.

                Adam

                Comment


                • #9
                  The problem is in the final -replace- command. For some reason you chose to put date_3 inside local macro quotes, and then embed that in ordinary quotes. But date_3 is a variable, not a local macro. So local macro date_3 does not exist, and `date_3' is interpreted as an empty string, and consequently date_3 is also an empty string. If you get rid of the `' around date_3, this then changes it to "date_3"--but that is still wrong because what you want to get is the contents of variable date_3, not the string "date_3". So just unwrap date_3 altogether and you will be fine:
                  Code:
                  replace date_3 = subinstr(date_3, "/", "_", .)
                  Added: I'm curious why you're doing this in the first place. In Stata, a date represented as a string variable is ranks high on the list of useless things. What do you plan to do with it?

                  Comment


                  • #10
                    Thank you Clyde, this worked how I wanted it to. I do not have a reason as to why I put date_3 in the local macro quote other then a misunderstanding of the syntax. While there may be a better more efficient approach what I am doing is using the above commands to obtain the date of the previous day so that I can access the string to automatically name various files based on the date. This is achieved by using the following:
                    export delimited using "myfilename`=date_3[1]'.csv" Those files are then used in a python code that updates a geodatabase each day with Covid-19 data. That geodatabase feeds an online dashboard that also accesses data based on the date and so I need a date field for that application as well. Basically, I use the date to name the files, access those files, and access the data.

                    I am open to suggestions for an easier way to name files based on the date.

                    Comment


                    • #11
                      Hi Adam,

                      Before I decided to peruse my PhD, I was a software engineer, and as an engineer I basically approve of any solution that works in a satisfactory manor. That being said, I am curious as to why you don't just use python to name you csv files. I am a firm believer in using the best tool for the job, and Stata is an excellent data analysis platform; but are you sure it is easiest to use Stata to generate these file names when this is already relatively easy to do in python?
                      Last edited by Daniel Schaefer; 24 Jun 2022, 22:08.

                      Comment


                      • #12
                        Hello Daniel,

                        The main reason is that I am using an outdated version of python (2.7 that comes with ArcMap), within that code I use variable substitution to call my files. I mainly use STATA to call the API for the data and preform some preprocessing. I did consider using PyCharm or something else that would allow me to access the API with a new version of python and I still might do that. In short as of now this workflow works for me and I usually only update it when I have the spare time as it feeds a side project separate from my PhD research.

                        Comment


                        • #13
                          With an appropriate choice of output format as the second argument to the string() function, the need to replace slashes with underscores is avoided.
                          Code:
                          . local date_2 = daily("`c(current_date)'","DMY") - 1
                          
                          . display "date_2 " %tdnn/dd/YY `date_2'
                          date_2  6/24/22
                          
                          . 
                          . local date_3 = string(`date_2', "%tdnn/dd/YY")
                          
                          . local date_3 = subinstr("`date_3'", "/", "_", .)
                          
                          . display "date_3 `date_3'"
                          date_3 6_24_22
                          
                          . 
                          . local date_4 = string(daily("`c(current_date)'","DMY") - 1, "%tdnn!_dd!_YY")
                          
                          . display "date_4 `date_4'"
                          date_4 6_24_22
                          See
                          Code:
                          help datetime display formats
                          for a comprehensive explanation of date and time format possibilities.

                          Comment

                          Working...
                          X