Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace characters in string variable when the position and characters vary

    Hello, I'm hoping someone can help me with this. I'm working with two 6-digit string variables and, from these, need to produce a third/final string variable. For example, my data set looks like this:

    Code:
    input id str6 string1 str6 string2
    1 "100000" "400000"
    2 "020000" "400000"
    3 "020100" "040003"
    4 "000022" "003300"
    end
    where the non-zero characters in STRING1 represent the correct values but not the correct character placement, and the non-zero characters in STRING2 represent the correct placement but not the correct values. What I need is a third variable that replaces the non-zero values in STRING2 with the non-zero values from STRING1 in the positions found in STRING2. In other words:
    • When ID==1, the third (new) string would read "1000000"
    • When ID==2, the third (new) string would read "2000000"
    • When ID==3, the third (new) string would read "0200001"
    • When ID==4, the third (new) string would read "0022000"
    I'm at a loss how to make this new variable. Because a value of "4" or "3" in STRING2 can be replaced by a value of "1" or "2", I haven't found a way to use -subinstr-. Also, because the positions in STRING1 and STRING2 aren't consistent across IDs, I haven't found a way to incorporate -substr-. At any rate, I would appreciate it if anyone has any thoughts on this matter.

    Thanks in advance.

  • #2
    Hi, when you describe the results the example has 7 characters. Assuming you want your result to have the same width as the original variables you might try:
    Code:
    clear
    input id str6 string1 str6 string2
    1 "100000" "400000"
    2 "020000" "400000"
    3 "020100" "040003"
    4 "000022" "003300"
    end
    
    gen string3 = string2
    replace string3 = regexr( string3,"([1-9])" ,"F")
    replace string3 = regexr( string3,"([1-9])" ,"L")
    replace string3 = subinstr(string3,"F", substr(subinstr(string1,"0","",.),1,1),1)
    replace string3 = subinstr(string3,"L", substr(subinstr(string1,"0","",.),2,1),1)
    Code:
    list
    
         +----------------------------------+
         | id   string1   string2   string3 |
         |----------------------------------|
      1. |  1    100000    400000    100000 |
      2. |  2    020000    400000    200000 |
      3. |  3    020100    040003    020001 |
      4. |  4    000022    003300    002200 |
         +----------------------------------+
    Last edited by Bjarte Aagnes; 24 May 2018, 08:10.

    Comment


    • #3
      Hi Bjarte,

      Thank you for the response. You're correct that I wanted the result to contain the same width as the original variables, and your code worked perfectly. I did have to expand the steps to account for more than 2 replacements, but I didn't indicate that was a possibility in my example. At any rate, here's my final code:

      Code:
      gen string3 = string2
      replace string3 = regexr( string3,"([1-9])" ,"A")
      replace string3 = regexr( string3,"([1-9])" ,"B")
      replace string3 = regexr( string3,"([1-9])" ,"C")
      replace string3 = regexr( string3,"([1-9])" ,"D")
      replace string3 = regexr( string3,"([1-9])" ,"E")
      replace string3 = regexr( string3,"([1-9])" ,"F")
      replace string3 = subinstr(string3,"A", substr(subinstr(string1,"0","",.),1,1),1)
      replace string3 = subinstr(string3,"B", substr(subinstr(string1,"0","",.),2,1),1)
      replace string3 = subinstr(string3,"C", substr(subinstr(string1,"0","",.),3,1),1)
      replace string3 = subinstr(string3,"D", substr(subinstr(string1,"0","",.),4,1),1)
      replace string3 = subinstr(string3,"E", substr(subinstr(string1,"0","",.),5,1),1)
      replace string3 = subinstr(string3,"F", substr(subinstr(string1,"0","",.),6,1),1)
      Thank you again for the help - much appreciated.

      Comment

      Working...
      X