Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -export delimited- creates CSV with illegible characters

    This is an updated description of a problem that I described here for Stata 14 and that I have now reproduced in Stata 15. I am working with a dataset that contains strings with not only plain ASCII characters. For data exchange I have to export the data to a CSV file but when this file is opened in Excel some of the characters are illegible. The problem can be demonstrated with the example data below.
    Code:
    clear
    input str62 NOTE
    "Encuesta de Caracterización Socioeconómica Nacional"         
    "Encuesta de Hogares de Propósitos Múltiples"                 
    "Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza"
    end
    
    export delimited using "test.csv", delim(",") replace
    When the CSV file is opened in a text editor it looks fine. When it is opened in Excel by double-clicking on "test.csv" or via File - Open, the strings look like this:
    Code:
    Encuesta de Caracterización Socioeconómica Nacional
    Encuesta de Hogares de Propósitos Múltiples
    Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza
    The data can be imported into Excel with the method described by Hua Peng (StataCorp) in this post, which involves manually selecting Unicode (UTF-8) encoding, but this is a bit too cumbersome for general use.

    Alan Riley (StataCorp) proposed converting the CSV file with unicode convertfile in this post, but when I apply this command to the CSV file exported from Stata, the conversion stops with an error message.
    Code:
    . unicode convertfile "test.csv" "test2.csv", dstencoding(latin1) replace
    Unicode character invalid for the target encoding found
        Invalid character starts at byte position 137.  Invalid character as bytes are 2013
    file "test.csv" partially converted to file "test2.csv"
    r(198);
    The "invalid character" appears to be the dash in the third observation because the conversion stops at this point. The file "test2.csv" contains only this text:
    Code:
    NOTE
    Encuesta de Caracterización Socioeconómica Nacional
    Encuesta de Hogares de Propósitos Múltiples
    Encuesta Nacional de Hogares
    Is it possible to create a CSV file with Stata that can be opened in Excel simply by double-clicking on the file or by using File - Open from the Excel menu?

  • #2
    Originally posted by Friedrich Huebler View Post
    This is an updated description of a problem that I described here for Stata 14 and that I have now reproduced in Stata 15. I am working with a dataset that contains strings with not only plain ASCII characters. For data exchange I have to export the data to a CSV file but when this file is opened in Excel some of the characters are illegible. The problem can be demonstrated with the example data below.
    Code:
    clear
    input str62 NOTE
    "Encuesta de Caracterización Socioeconómica Nacional"
    "Encuesta de Hogares de Propósitos Múltiples"
    "Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza"
    end
    
    export delimited using "test.csv", delim(",") replace
    When the CSV file is opened in a text editor it looks fine. When it is opened in Excel by double-clicking on "test.csv" or via File - Open, the strings look like this:
    Code:
    Encuesta de Caracterización Socioeconómica Nacional
    Encuesta de Hogares de Propósitos Múltiples
    Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza
    The data can be imported into Excel with the method described by Hua Peng (StataCorp) in this post, which involves manually selecting Unicode (UTF-8) encoding, but this is a bit too cumbersome for general use.

    Alan Riley (StataCorp) proposed converting the CSV file with unicode convertfile in this post, but when I apply this command to the CSV file exported from Stata, the conversion stops with an error message.
    Code:
    . unicode convertfile "test.csv" "test2.csv", dstencoding(latin1) replace
    Unicode character invalid for the target encoding found
    Invalid character starts at byte position 137. Invalid character as bytes are 2013
    file "test.csv" partially converted to file "test2.csv"
    r(198);
    The "invalid character" appears to be the dash in the third observation because the conversion stops at this point. The file "test2.csv" contains only this text:
    Code:
    NOTE
    Encuesta de Caracterización Socioeconómica Nacional
    Encuesta de Hogares de Propósitos Múltiples
    Encuesta Nacional de Hogares
    Is it possible to create a CSV file with Stata that can be opened in Excel simply by double-clicking on the file or by using File - Open from the Excel menu?

    Try a different encoding. The question is what extended-ASCII encoding(s) Excel supports, and if one of those encodings supports the characters in your dataset. Many data that originated on a Windows computer may use the Windows-1252 encoding, which is almost, but not exactly, the same as Latin1.

    Code:
    unicode convertfile "test.csv" "test2.csv", dstencoding(Windows-1252) replace
    That works for me in terms of the unicode convertfile command, so all the Unicode characters in your short example data are represented by extended ASCII characters in the Windows-1252 encoding. Hopefully Excel will accept that too.

    Comment


    • #3
      Thank you. With Windows-1252 encoding the CSV file can be opened in Excel without problems.

      Comment


      • #4
        In addition to what Alan Riley (StataCorp) suggested, you may use -export excel- instead of -export delimited-. -export excel- handles the UTF-8 encoding for you so the resulted xlsx file works in Excel without needing any extra step.
        Code:
         export excel using "test.xlsx", replace
        Last edited by sladmin; 28 Jun 2017, 16:10. Reason: update to link to Alan Riley

        Comment


        • #5
          Thank you for the reference to export excel. In my case a file in CSV format was needed and Excel was only used as a convenient tool to review the data.

          Comment

          Working...
          X