Can one determine the proper encoding for a text file to specify when using -import delimited-?
I've increasingly encountered CSV files from various sources that imported with upper ASCII characters in a variable name and label because (I learned) the files had UTF-8 encoding, which I mistakenly imported with the default latin1 encoding. (Stata version 15.1). While this problem is easy enough to fix after the fact, is there a way to get the proper encoding other than having external knowledge of how the file was encoded and specifying it with the encoding() option? I see that newer versions of Microsoft Excel offer UTF-8 encoding of CSV files as an option, which I guess accounts for this issue becoming more frequent.
(While there have been other threads on StataList in the direction of this topic, I didn't find one that narrowed down the issue to the possibility of handling the encoding difference before it bites.)
Comment