I have encountered a strange glitch(?) when combining cond() and ustrregexs(). Using the following code (where the regex operator \d matches a single numeric digit):
The output I expect is as follows (for each observation which matches the ustrregexm(), var2 contains the matching digit. Otherwise, var2 contains a copy of var1):
The actual output looks different, however:
While the output for non-matching observations is as expected, it seems that when the ustrregexm() results in a match, that match is used to evaluate the next observation's ustrregexs().
This is odd because ostensibly the current observation's ustrregexm() must be evaluated before the current observation's ustrregexs() in order to determine whether the condition is true or false, which means that the ustrregexs() should subsequently evaluate in the current observation.
I can't put my finger on why exactly cond() and ustrregexs() behave this way. Any ideas would be appreciated.
Note: I am aware I could use ustrregexra to achieve the same effect, but I am specifically hoping to understand why cond behaves this way.
Code:
clear input str3 var1 "1a" "2b" "3c" "abc" end gen var2 = cond(ustrregexm(var1,"\d"),ustrregexs(0),var1)
Code:
+-------------+ | var1 var2 | |-------------| 1. | 1a 1 | 2. | 2b 2 | 3. | 3c 3 | 4. | abc abc | +-------------+
Code:
+-------------+ | var1 var2 | |-------------| 1. | 1a | 2. | 2b 1 | 3. | 3c 2 | 4. | abc abc | +-------------+
This is odd because ostensibly the current observation's ustrregexm() must be evaluated before the current observation's ustrregexs() in order to determine whether the condition is true or false, which means that the ustrregexs() should subsequently evaluate in the current observation.
I can't put my finger on why exactly cond() and ustrregexs() behave this way. Any ideas would be appreciated.
Note: I am aware I could use ustrregexra to achieve the same effect, but I am specifically hoping to understand why cond behaves this way.
Comment