This is an updated description of a problem that I described here for Stata 14 and that I have now reproduced in Stata 15. I am working with a dataset that contains strings with not only plain ASCII characters. For data exchange I have to export the data to a CSV file but when this file is opened in Excel some of the characters are illegible. The problem can be demonstrated with the example data below.
When the CSV file is opened in a text editor it looks fine. When it is opened in Excel by double-clicking on "test.csv" or via File - Open, the strings look like this:
The data can be imported into Excel with the method described by Hua Peng (StataCorp) in this post, which involves manually selecting Unicode (UTF-8) encoding, but this is a bit too cumbersome for general use.
Alan Riley (StataCorp) proposed converting the CSV file with unicode convertfile in this post, but when I apply this command to the CSV file exported from Stata, the conversion stops with an error message.
The "invalid character" appears to be the dash in the third observation because the conversion stops at this point. The file "test2.csv" contains only this text:
Is it possible to create a CSV file with Stata that can be opened in Excel simply by double-clicking on the file or by using File - Open from the Excel menu?
Code:
clear input str62 NOTE "Encuesta de Caracterización Socioeconómica Nacional" "Encuesta de Hogares de Propósitos Múltiples" "Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza" end export delimited using "test.csv", delim(",") replace
Code:
Encuesta de Caracterización Socioeconómica Nacional Encuesta de Hogares de Propósitos Múltiples Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza
Alan Riley (StataCorp) proposed converting the CSV file with unicode convertfile in this post, but when I apply this command to the CSV file exported from Stata, the conversion stops with an error message.
Code:
. unicode convertfile "test.csv" "test2.csv", dstencoding(latin1) replace Unicode character invalid for the target encoding found Invalid character starts at byte position 137. Invalid character as bytes are 2013 file "test.csv" partially converted to file "test2.csv" r(198);
Code:
NOTE Encuesta de Caracterización Socioeconómica Nacional Encuesta de Hogares de Propósitos Múltiples Encuesta Nacional de Hogares
Comment