Text Function CHAR() Missing Extended ASCII?

Christopher Swearingen

Join Date: Apr 2014

Posts: 2
#1

Text Function CHAR() Missing Extended ASCII?

26 May 2015, 16:42

I am running:
Stata MP v14.0

update level 05May2015

Windows 7 Professional

My programming standard is to create semicolon-delimited tables for ease of import into MS Excel. I tend to annotate tables using asterisks or dagger. The dagger was accessible via the CHAR() function pre Stata 14 (i.e. char(134)). However, it appears that the CHAR() function no longer provides extended ASCII.

Code:

forval i = 0(1)255 { di "ASCII Code `i': " char(`i') }

I receive output upto ASCII Code 127, but at 128 and above the output is a null box character.

Examining the code point definition, the "original ASCII encoding system contains only 128 code points and thus can represent only 128 characters. Historically, the 128 additional bytes of
extended ASCII have been encoded...".

While CHAR() function supports a byte input ranging from 0 to 255, it appears not to have the last 128 encodings.

Can this be confirmed?

FWIW - uchar(8224) will provide the dagger...

Thanks!
Tags: ASCII, function, string, syntax, unicode
Nick Cox

Join Date: Mar 2014

Posts: 35697
#2

26 May 2015, 17:34

For documentation see http://www.stata.com/help.cgi?char()

NOTE You need both ().

Last edited by Nick Cox; 26 May 2015, 18:02.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

26 May 2015, 17:56

Actually, that link takes you to the help for characteristics. The char() function is at http://www.stata.com/manuals14/fnstr...functionschar()
Comment
Alan Riley (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 168
#4

27 May 2015, 11:37

Originally posted by Christopher Swearingen View Post

I am running:
Stata MP v14.0

update level 05May2015

Windows 7 Professional

My programming standard is to create semicolon-delimited tables for ease of import into MS Excel. I tend to annotate tables using asterisks or dagger. The dagger was accessible via the CHAR() function pre Stata 14 (i.e. char(134)). However, it appears that the CHAR() function no longer provides extended ASCII.

Code:

forval i = 0(1)255 { di "ASCII Code `i': " char(`i') }

I receive output upto ASCII Code 127, but at 128 and above the output is a null box character.

Examining the code point definition, the "original ASCII encoding system contains only 128 code points and thus can represent only 128 characters. Historically, the 128 additional bytes of
extended ASCII have been encoded...".

While CHAR() function supports a byte input ranging from 0 to 255, it appears not to have the last 128 encodings.

Can this be confirmed?

FWIW - uchar(8224) will provide the dagger...

Thanks!

Stata 14 is fully Unicode-based. Thus, all text output is assumed to be Unicode. If you output an extended ASCII character without converting it to Unicode, it will display as an invalid Unicode character -- on your operating system, that's the "null box character".

If you were to

Code:

log using look.log display "ASCII code 134: " char(134) log close

and then examined a hexdump of the log file, you would see that the file indeed contained extended ASCII character 134.

To display that same character in Stata 14, you can do one of the following:

1) Use the character itself rather than a function with a code point for it -- if you find a "dagger" character on a website, you should be able to just copy/paste it into Stata 14 and it will work.

2) Use the Unicode code point for that character, as you found with uchar(8224).

3) Use the char() function with the extended ASCII code of choice, and pass the result to ustrfrom() along with the extended ASCII encoding that character is from to get the appropriate Unicode character. For example,

Code:

display ustrfrom(char(134), "cp1252", 1)

You might be wondering why I used "cp1252" as the encoding rather than "latin1" given that "latin1" is the most commonly-used encoding on the Internet. The reason is that cp1252 is a Windows-specific extended ASCII code page which is almost exactly, but not quite, the same as latin1. One of the differences between them is that cp1252 happens to map 134 to the dagger character you want.
Comment

Announcement

Text Function CHAR() Missing Extended ASCII?

Comment

Comment

Comment