Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to remove an extra digit/character from a string variable

    Dear Statalist users

    I have a dataset which has a string variable in three parts, by convention, separated by a hyphen: e.g. 12345-2020-0001
    The last part is designed to run serially from 0001 to 9999. However, the dataset has observations with extra zeros like -00001or -00045 instead of -0001 and -0045 respectively.
    I would like to remove one of the zeros in -00001 and -00045
    I have tried the command below without success

    local var1 "stringvar"
    foreach r of local var1 {
    replace `r' =subinstr(`r',"`-000??'","`-00??'",.)
    }

    I am using Stata SE version 16.1 for Mac

    Can someone help me with a solution.
    Thank you
    Mutugi Muriithi


  • #2
    After running the said command, the response from Stata was: (0 real changes made)

    Comment


    • #3
      Does this help?

      Code:
      clear 
      input str42 stringvar 
      12345-2020-0001
      65432-2019-00045 
      end 
      
      split stringvar, p(-) gen(wanted) 
      
      replace wanted3 = substr(wanted3, 2, .) if length(wanted3) == 5 
      
      list 
      
           +------------------------------------------------+
           |        stringvar   wanted1   wanted2   wanted3 |
           |------------------------------------------------|
        1. |  12345-2020-0001     12345      2020      0001 |
        2. | 65432-2019-00045     65432      2019      0045 |
           +------------------------------------------------+

      Comment


      • #4
        Thank you Nick,
        Your suggestion worked perfectly well and sorted my problems
        I used the command below to concatenate the three variables to get my new variable 'sampleid' in the desired format.
        egen sampleid = concat(wanted1 wanted2 wanted3), punct(-)

        Thank you
        Mutugi Muriithi

        Comment


        • #5
          Code:
          replace stringvar = substr(stringvar, 1, strrpos(stringvar, "-")) + substr(substr(stringvar, strrpos(stringvar, "-") + 1, .), -4, 4)
          might win some minor prize as shorter but more obfuscated code.

          Comment


          • #6
            Oh yes! The second one worked also. I agree, it looks more complicated but it gives me the solution I was looking for. Do you mind explaining a little bit what the ",-4, 4) means?

            Comment


            • #7
              See the help. Negative counts are counts from the end of the string.

              substr(s,n1,n2)
              Description: the substring of s, starting at n1, for a length of n2

              substr() is intended for use with only plain ASCII characters and for use by
              programmers who want to extract a subset of bytes from a string. For those with
              plain ASCII text, n1 is the starting character, and n2 is the length of the string in
              characters. For programmers, substr() is technically a byte-based function. For
              plain ASCII characters, the two are equivalent but you can operate on byte values
              beyond that range. Note that any Unicode character beyond ASCII range (code point
              greater than 127) takes more than 1 byte in the UTF-8 encoding; for example, é takes
              2 bytes.

              To obtain substrings of Unicode strings, see usubstr().

              If n1 < 0, n1 is interpreted as the distance from the end of the string; if n2 = .
              (missing), the remaining portion of the string is returned.

              substr("abcdef",2,3) = "bcd"
              substr("abcdef",-3,2) = "de"
              substr("abcdef",2,.) = "bcdef"
              substr("abcdef",-3,.) = "def"
              substr("abcdef",2,0) = ""
              substr("abcdef",15,2) = ""
              Domain s: strings
              Domain n1: integers >= 1 and <= -1
              Domain n2: integers >= 1
              Range: strings


              Comment

              Working...
              X