Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Query on storing strings using st_sstore()

    Colleagues: My objective is to store a string-valued variable in Stata using some Mata code. One of the characters I want to include in the string is "•" (I'm not sure of its ASCII/Unicode value; it's alt-8 on a Mac keyboard).

    I'm having difficulty doing this using st_sstore although getmata seems to work fine. Here's a simplified example
    Code:
    clear
    
    set obs 5
    
    gen str1 s1=""
    
    mata: z=J(5,1,"•")
    mata: st_sstore(.,"s1",z)
    
    getmata z
    
    list s1 z
    This returns:
    Code:
         +---------+
         | s1   z  |
         |---------|
      1. |  �   • |
      2. |  �   • |
      3. |  �   • |
      4. |  �   • |
      5. |  �   • |
         +---------+
    I can certainly proceed using getmata but I'm wondering if anyone has an idea why st_sstore doesn't seem to work here while getmata does. Thanks in advance.
    Last edited by John Mullahy; 15 Dec 2018, 16:44. Reason: typo in original post

  • #2
    I think that it's called an elevated dot, or something like that. Anyway, it's three bytes long, and so your string variable should be lengthened in order to accommodate it.

    .ÿversionÿ15.1

    .ÿ
    .ÿclearÿ*

    .ÿsetÿlinesizeÿ72

    .ÿ
    .ÿmata:
    -------------------------------------------------ÿmataÿ(typeÿendÿtoÿexit
    >ÿ)ÿ--------------------------------------------------------------------
    :ÿ
    :ÿst_addobs(5)

    :ÿ
    :ÿvarindexÿ=ÿst_addvar("str3",ÿ"st")

    :ÿ
    :ÿZÿ=ÿJ(5,ÿ1,ÿ"•")

    :ÿZ
    ÿÿÿÿÿÿÿ1
    ÿÿÿÿ+-----+
    ÿÿ1ÿ|ÿÿÿÿ|
    ÿÿ2ÿ|ÿÿÿÿ|
    ÿÿ3ÿ|ÿÿÿÿ|
    ÿÿ4ÿ|ÿÿÿÿ|
    ÿÿ5ÿ|ÿÿÿÿ|
    ÿÿÿÿ+-----+

    :ÿascii(Z[1])
    ÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿ2ÿÿÿÿÿ3
    ÿÿÿÿ+-------------------+
    ÿÿ1ÿ|ÿÿ226ÿÿÿ128ÿÿÿ162ÿÿ|
    ÿÿÿÿ+-------------------+

    :ÿst_sstore(.,ÿvarindex,ÿZ)

    :ÿ
    :ÿzÿ=ÿst_sdata(1,ÿvarindex)

    :ÿascii(z)
    ÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿ2ÿÿÿÿÿ3
    ÿÿÿÿ+-------------------+
    ÿÿ1ÿ|ÿÿ226ÿÿÿ128ÿÿÿ162ÿÿ|
    ÿÿÿÿ+-------------------+

    :ÿ
    :ÿend
    ------------------------------------------------------------------------

    .ÿ
    .ÿlist,ÿnoobs

    ÿÿ+----+
    ÿÿ|ÿstÿ|
    ÿÿ|----|
    ÿÿ|ÿÿÿ|
    ÿÿ|ÿÿÿ|
    ÿÿ|ÿÿÿ|
    ÿÿ|ÿÿÿ|
    ÿÿ|ÿÿÿ|
    ÿÿ+----+

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Thanks very much. The idea of it being larger than one byte never occurred to me. Much appreciate your taking time to enlighten.

      Comment


      • #4
        James beat me to the punch, but in case it helps, here's my logic. TL;DR: You've drifted into Unicode hell.

        Mata is creating Unicode "bullet" characters (\u2022 hex e280a2), not extended ASCII "bullet" characters (hex a5 in the Mac extended ASCII encoding) that you're historically used to. But Stata's string variable type counts bytes, not "characters", and it doesn't add bytes when it receives a single character as a 3-byte unicode character string.
        Code:
        . clear
        
        . set obs 5
        number of observations (_N) was 0, now 5
        
        . gen str1 s1 = ""
        (5 missing values generated)
        
        . gen str9 s9 = ""
        (5 missing values generated)
        
        . mata: z1=J(5,1,"•")
        
        . mata: st_sstore(.,"s1",z1)
        
        . mata: st_sstore(.,"s9",z1)
        
        . getmata z1
        
        . describe s1 s9 z1
        
                      storage   display    value
        variable name   type    format     label      variable label
        --------------------------------------------------------------------------------------------------
        s1              str1    %9s                   
        s9              str9    %9s                   
        z1              str3    %9s                   
        
        . generate x1 = tobytes(s1,1)
        
        . generate x9 = tobytes(s9,1)
        
        . generate zz = tobytes(z1,1)
        
        . list, noobs
        
          +---------------------------------------------------+
          | s1   s9   z1     x1             x9             zz |
          |---------------------------------------------------|
          |  �    •    •   \xe2   \xe2\x80\xa2   \xe2\x80\xa2 |
          |  �    •    •   \xe2   \xe2\x80\xa2   \xe2\x80\xa2 |
          |  �    •    •   \xe2   \xe2\x80\xa2   \xe2\x80\xa2 |
          |  �    •    •   \xe2   \xe2\x80\xa2   \xe2\x80\xa2 |
          |  �    •    •   \xe2   \xe2\x80\xa2   \xe2\x80\xa2 |
          +---------------------------------------------------+

        Comment


        • #5
          Thanks very much for the helpful insights.

          Comment

          Working...
          X