Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata results window and Hebrew

    Hello,

    I'm trying to work with Hebrew (variable names, value labels and strings) in Stata and have an issue with displaying it properly. The data browser window displays everything correctly and there's also no issue with Hebrew in the syntax editor. But whenever a Hebrew text appears in the main results window - it appears reversed. So I see "olleH" instead of "Hello" (when using Hebrew text of course). This kind of problem was common with Hebrew (and I guess with other right-to-left languages as well) in all kinds of older software, but is there a solution to that in Stata ?

    (The attached screenshot shows the issue clearly. The history pane shows the display command exactly as I entered it, but the Hebrew text is reversed in the results.)

    Thanks.

    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	23.7 KB
ID:	1569662



  • #2
    For those who may be curious about this, here is example code that reproduces the problem in Stata 16.1 for Mac running on macOS Catalina 10.15.6.
    Code:
    clear all
    input str16 a
    שלום
    Hello
    end
    list
    browse
    All the occurrences of the right-to-left text shown in Stata's Results window are reversed, yet the string value created in the dataset is displayed properly by the Data Browser.

    But when I copy the contents of the Results window and paste it into this post, the characters occur in the correct order (!!!) which suggests that the full Unicode nature of the text is preserved in what is (incorrectly) displayed.
    Code:
    . clear all
    
    . input str16 a
    
                        a
      1. שלום
      2. Hello
      3. end
    
    . list
    
         +-------+
         |     a |
         |-------|
      1. |  שלום |
      2. | Hello |
         +-------+
    
    . browse
    
    .
    This suggests to me that the Stata Results window does not fully handle right-to-left text. This could be a problem with Stata, or a problem with Stata for Mac, or a problem with the macOS APIs used to populate the Results window. I don't see anything in Stata's documentation that discusses this issue, and if you don't get a useful suggestion from someone who has encountered and overcome this problem, you might submit this to Stata Technical Services for their advice.

    Comment


    • #3
      I get the same as William Lisowski when I use Stata 16.1 on Windows 10

      Code:
      . clear all
      
      . input str16 a
      
                          a
        1. שלום
        2. Hello
        3. end
      
      . list
      
           +-------+
           |     a |
           |-------|
        1. |  שלום |
        2. | Hello |
           +-------+

      Comment


      • #4
        Stata's backend Unicode functions are fully capable of handling any languages. This is purely a display issue in the result window.

        Try the following undocumented setting:

        set usecharalignment off


        interactively before running the do-file.


        Code:
         set usecharalignment off
        
        . do "C:\Users\HOP~1.DEV\AppData\Local\Temp\STD527c_000000.tmp"
        
        .
        . clear all
        
        . input str16 a
        
                            a
          1. שלום
          2. Hello
          3. end
        
        . list
        
             +-------+
             |     a |
             |-------|
          1. |  שלום |
          2. | Hello |
             +-------+
        Longer explanation, Stata's result window is deeply rooted in the Unix terminal, which emphasizes on the column-based character by character alignment. Hence by default, the result window is displaying one character aligned to a column boundary at a time. The behavior naturally is not up to handle complex script language, which the shape or positioning of a glyph depending on its relation to other glyphs. Examples of complex script languages are Hebrew, Arabic, most of the South Asian languages (Bengali, Hindi, Nepali, etc.).

        Note, set usecharalignment off is a crude attempt to address some issues of displaying complex scripts. It does not really solve any issues, especially displaying the text in a table context. For proper handling complex scripts, the result window needs to be rewritten to handle complex script layout, and very likely its current behavior of displaying will have to change.

        For more information about the complex script layout, see https://en.wikipedia.org/wiki/Comple...ai%20alphabet.



        ​​​​​
        Last edited by Hua Peng (StataCorp); 24 Aug 2020, 08:16.

        Comment


        • #5
          Thanks all for looking into it, the "set usecharalignment off" command does help in my case and reverses the default incorrect order of the letters! I don't think that Hebrew is considered complex in the same sense as other languages mentioned, each of its letters are separate and unrelated to others in a given word, the only difference is that it's written right-to-left.
          Last edited by Evgeny Sironov; 24 Aug 2020, 09:46.

          Comment


          • #6
            The link from post #4 suggests that bidirectional text in and of itself is sufficient to qualify the text as "complex text layout" (CTL) which is a technical term. Since the Results window will contain Stata commands and output written in the "Latin" and other left-to-right characters, adding a right-to-left character set such as Hebrew renders the resulting text - the entirety of the the Results window - complex in the technical sense of the phrase "complex text layout".

            What is implicit in what Hua Peng (StataCorp) wrote is that there are a number of issues around getting CTL to format properly; right-to-left ordering is just one of the issues. I expect that Stata will want to handle the issues systematically, rather than applying ad-hoc workarounds to handle issues one-at-a-time. A little thought about what's involved in support right-to-left text led me to conclude that it's harder than I thought at first. Perhaps Stata hopes to be able to adopt an off-the-shelf framework to fully support CTL in the Results window (and in log files), as they did, for example, in adding Unicode regular expression support.

            In the meanwhile, I note that the output displayed in the Results window in post #2 can be copied-and-pasted into a CTL-capable editor such as Microsoft Word, or on macOS the plain text editor BBedit, or on Windows perhaps a similar text editor, where it is then displayed properly and can be printed properly. As indeed post #2 demonstrated that the software behind Statalist displays the material copied from the Results window "properly" (although in this case that means a display that looks different than the Results window display).

            Comment


            • #7
              Thanks William.

              Before the proposed workaround with usecharalignment command I planned on using the browser window as my ad-hoc results window when using Hebrew, by employing a combo of collapse/statsby and preserve and restore commands. And although I'm no programmer and can't foresee all the difficulties, that got me thinking - why can't the ancient results window behave in a similar way the data browser already does? Pretty much all non graphical output is best served as a relational table to begin with. But that's probably best left for Stata wishlist thread (and in fact similar idea is mentioned already: https://www.statalist.org/forums/for...51#post1566951 ).
              Last edited by Evgeny Sironov; 24 Aug 2020, 14:02.

              Comment


              • #8
                set usecharalignment off

                work for Arabic language as well. Since Arabic language is written from right to left, the above command works perfectly with Arabic.
                Hua Peng Thank you so much for your post.

                Comment

                Working...
                X