Replace characters in string variable when the position and characters vary

matt starcke

Join Date: Oct 2014

Posts: 8
#1

Replace characters in string variable when the position and characters vary

24 May 2018, 06:46

Hello, I'm hoping someone can help me with this. I'm working with two 6-digit string variables and, from these, need to produce a third/final string variable. For example, my data set looks like this:

Code:

input id str6 string1 str6 string2 1 "100000" "400000" 2 "020000" "400000" 3 "020100" "040003" 4 "000022" "003300" end

where the non-zero characters in STRING1 represent the correct values but not the correct character placement, and the non-zero characters in STRING2 represent the correct placement but not the correct values. What I need is a third variable that replaces the non-zero values in STRING2 with the non-zero values from STRING1 in the positions found in STRING2. In other words:
When ID==1, the third (new) string would read "1000000"

When ID==2, the third (new) string would read "2000000"

When ID==3, the third (new) string would read "0200001"

When ID==4, the third (new) string would read "0022000"

I'm at a loss how to make this new variable. Because a value of "4" or "3" in STRING2 can be replaced by a value of "1" or "2", I haven't found a way to use -subinstr-. Also, because the positions in STRING1 and STRING2 aren't consistent across IDs, I haven't found a way to incorporate -substr-. At any rate, I would appreciate it if anyone has any thoughts on this matter.

Thanks in advance.
Tags: None

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

24 May 2018, 07:49

Hi, when you describe the results the example has 7 characters. Assuming you want your result to have the same width as the original variables you might try:

Code:

clear
input id str6 string1 str6 string2
1 "100000" "400000"
2 "020000" "400000"
3 "020100" "040003"
4 "000022" "003300"
end

gen string3 = string2
replace string3 = regexr( string3,"([1-9])" ,"F")
replace string3 = regexr( string3,"([1-9])" ,"L")
replace string3 = subinstr(string3,"F", substr(subinstr(string1,"0","",.),1,1),1)
replace string3 = subinstr(string3,"L", substr(subinstr(string1,"0","",.),2,1),1)

Code:

list

     +----------------------------------+
     | id   string1   string2   string3 |
     |----------------------------------|
  1. |  1    100000    400000    100000 |
  2. |  2    020000    400000    200000 |
  3. |  3    020100    040003    020001 |
  4. |  4    000022    003300    002200 |
     +----------------------------------+

Last edited by Bjarte Aagnes; 24 May 2018, 08:10.

Comment

matt starcke

Join Date: Oct 2014
Posts: 8

24 May 2018, 08:38

Hi Bjarte,

Thank you for the response. You're correct that I wanted the result to contain the same width as the original variables, and your code worked perfectly. I did have to expand the steps to account for more than 2 replacements, but I didn't indicate that was a possibility in my example. At any rate, here's my final code:

Code:

gen string3 = string2
replace string3 = regexr( string3,"([1-9])" ,"A")
replace string3 = regexr( string3,"([1-9])" ,"B")
replace string3 = regexr( string3,"([1-9])" ,"C")
replace string3 = regexr( string3,"([1-9])" ,"D")
replace string3 = regexr( string3,"([1-9])" ,"E")
replace string3 = regexr( string3,"([1-9])" ,"F")
replace string3 = subinstr(string3,"A", substr(subinstr(string1,"0","",.),1,1),1)
replace string3 = subinstr(string3,"B", substr(subinstr(string1,"0","",.),2,1),1)
replace string3 = subinstr(string3,"C", substr(subinstr(string1,"0","",.),3,1),1)
replace string3 = subinstr(string3,"D", substr(subinstr(string1,"0","",.),4,1),1)
replace string3 = subinstr(string3,"E", substr(subinstr(string1,"0","",.),5,1),1)
replace string3 = subinstr(string3,"F", substr(subinstr(string1,"0","",.),6,1),1)

Thank you again for the help - much appreciated.

Announcement