Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Text Function CHAR() Missing Extended ASCII?

    I am running:
    • Stata MP v14.0
    • update level 05May2015
    • Windows 7 Professional
    My programming standard is to create semicolon-delimited tables for ease of import into MS Excel. I tend to annotate tables using asterisks or dagger. The dagger was accessible via the CHAR() function pre Stata 14 (i.e. char(134)). However, it appears that the CHAR() function no longer provides extended ASCII.

    Code:
    forval i = 0(1)255 {
          di "ASCII Code `i': " char(`i')
    }
    I receive output upto ASCII Code 127, but at 128 and above the output is a null box character.

    Examining the code point definition, the "original ASCII encoding system contains only 128 code points and thus can represent only 128 characters. Historically, the 128 additional bytes of
    extended ASCII have been encoded...".

    While CHAR() function supports a byte input ranging from 0 to 255, it appears not to have the last 128 encodings.

    Can this be confirmed?

    FWIW - uchar(8224) will provide the dagger...


    Thanks!

  • #2
    For documentation see http://www.stata.com/help.cgi?char()

    NOTE You need both ().
    Last edited by Nick Cox; 26 May 2015, 18:02.

    Comment


    • #3
      Actually, that link takes you to the help for characteristics. The char() function is at http://www.stata.com/manuals14/fnstr...functionschar()

      Comment


      • #4
        Originally posted by Christopher Swearingen View Post
        I am running:
        • Stata MP v14.0
        • update level 05May2015
        • Windows 7 Professional
        My programming standard is to create semicolon-delimited tables for ease of import into MS Excel. I tend to annotate tables using asterisks or dagger. The dagger was accessible via the CHAR() function pre Stata 14 (i.e. char(134)). However, it appears that the CHAR() function no longer provides extended ASCII.

        Code:
        forval i = 0(1)255 {
        di "ASCII Code `i': " char(`i')
        }
        I receive output upto ASCII Code 127, but at 128 and above the output is a null box character.

        Examining the code point definition, the "original ASCII encoding system contains only 128 code points and thus can represent only 128 characters. Historically, the 128 additional bytes of
        extended ASCII have been encoded...".

        While CHAR() function supports a byte input ranging from 0 to 255, it appears not to have the last 128 encodings.

        Can this be confirmed?

        FWIW - uchar(8224) will provide the dagger...


        Thanks!

        Stata 14 is fully Unicode-based. Thus, all text output is assumed to be Unicode. If you output an extended ASCII character without converting it to Unicode, it will display as an invalid Unicode character -- on your operating system, that's the "null box character".

        If you were to

        Code:
        log using look.log
        display "ASCII code 134: " char(134)
        log close
        and then examined a hexdump of the log file, you would see that the file indeed contained extended ASCII character 134.

        To display that same character in Stata 14, you can do one of the following:

        1) Use the character itself rather than a function with a code point for it -- if you find a "dagger" character on a website, you should be able to just copy/paste it into Stata 14 and it will work.

        2) Use the Unicode code point for that character, as you found with uchar(8224).

        3) Use the char() function with the extended ASCII code of choice, and pass the result to ustrfrom() along with the extended ASCII encoding that character is from to get the appropriate Unicode character. For example,

        Code:
        display ustrfrom(char(134), "cp1252", 1)
        You might be wondering why I used "cp1252" as the encoding rather than "latin1" given that "latin1" is the most commonly-used encoding on the Internet. The reason is that cp1252 is a Windows-specific extended ASCII code page which is almost exactly, but not quite, the same as latin1. One of the differences between them is that cp1252 happens to map 134 to the dagger character you want.

        Comment

        Working...
        X