Hello all,
Using Stata 15.1/IC
I need to submit a bulk file with a string variable ("NAME" variable in this example) that is required to have no special characters besides ampersand and dash. I am able to accomplish this using the following series of commands:
charlist NAME //shows which characters are in my string var NAME
"&',-./01234689ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop qrstuvwxyz
egen NEWNAME= sieve(NAME), omit(,./`"""'`"'"') // generates new variable with the special characters omitted but retains & and -
Results:
While this approach works as intended, I wanted to be able to use a command that is not dependent on the specific characters to be omitted, which could change between datasets (e.g. a character like "+" or "@" would not be excluded in a string variable that had them with my code--I'd have to manually update the command). Plus, the way you have to set off double- and single quote marks makes it hard to read in the log file.
I thought I could use the char() function to generalize the command by using the integer values associated with ASCII characters with a forvaluesloop (under the assumption I will nor run into any non-ASCII special characters), but I get the following error:
. forvalues i = 33/37 39/44 46/47 58/64 91/96 123/126 {
2. replace NAME = subinstr(NAME, char(`i'), "", .)
3. }
invalid syntax
r(198);
I am, however, able to use the foreachcommand without error:
. foreach i in 33 34 35 36 37 39 40 41 42 43 44 46 47 58 59 60 61 62 63 64 91 92 93 94 95 96 123 124 125 126 {
2. replace NAME =subinstr(NAME, char(`i'), "", .)
3. }
My question is why the forvalues command doesn't work. My presupposition is that I just did something wrong in the command syntax-wise, but I also wondered if Stata treats values in the char() function differently than I thought when used with forvalues.
Of course, if there is an even better way to accomplish the elimination of all special characters besides ampersands and dashes, I am all ears. Thanks for any advice.
Using Stata 15.1/IC
I need to submit a bulk file with a string variable ("NAME" variable in this example) that is required to have no special characters besides ampersand and dash. I am able to accomplish this using the following series of commands:
charlist NAME //shows which characters are in my string var NAME
"&',-./01234689ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop qrstuvwxyz
egen NEWNAME= sieve(NAME), omit(,./`"""'`"'"') // generates new variable with the special characters omitted but retains & and -
Results:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str86(NAME NEWNAME)
"Single-Benefits, Inc." "Single-Benefits Inc"
"Superstar, LLC" "Superstar LLC"
"RML Agency, Inc." "RML Agency Inc"
"A & M Company, Inc." "A & M Company Inc"
end
I thought I could use the char() function to generalize the command by using the integer values associated with ASCII characters with a forvaluesloop (under the assumption I will nor run into any non-ASCII special characters), but I get the following error:
. forvalues i = 33/37 39/44 46/47 58/64 91/96 123/126 {
2. replace NAME = subinstr(NAME, char(`i'), "", .)
3. }
invalid syntax
r(198);
I am, however, able to use the foreachcommand without error:
. foreach i in 33 34 35 36 37 39 40 41 42 43 44 46 47 58 59 60 61 62 63 64 91 92 93 94 95 96 123 124 125 126 {
2. replace NAME =subinstr(NAME, char(`i'), "", .)
3. }
My question is why the forvalues command doesn't work. My presupposition is that I just did something wrong in the command syntax-wise, but I also wondered if Stata treats values in the char() function differently than I thought when used with forvalues.
Of course, if there is an even better way to accomplish the elimination of all special characters besides ampersands and dashes, I am all ears. Thanks for any advice.
Comment