Strip BOM in import delimited

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#1

Strip BOM in import delimited

28 Apr 2014, 10:34

Hello Statalisters, is there any simple option to strip the BOM from a CSV before importing the data? I can certainly do this with:
the direct file read/write operations before calling import delimited, or

trimming the label of the first variable after calling import delimited,

but I am looking for a solution involving only supplying options to import delimited.

Example file to reproduce the problem.

Illustration of the problem:

Thank you, Sergiy Radyakin
Attached Files
Tags: None
Dave Airey

Join Date: Apr 2014

Posts: 398
#2

28 Apr 2014, 11:10

So this only affects your first variable label? Why not just fix your first variable label when you don't have control over file creation?
Comment
Dave Airey

Join Date: Apr 2014

Posts: 398
#3

28 Apr 2014, 12:30

I read about this just now at <http://cran.r-project.org/doc/manuals/r-patched/R-data.pdf>, by searching BOM in the PDF. I suppose import delimited should have a file encoding option added, if it is not there already.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#4

28 Apr 2014, 15:36

Originally posted by Dave Airey View Post

So this only affects your first variable label? Why not just fix your first variable label when you don't have control over file creation?

Yes, Dave, I can do this, but I need to explain to other people (who have no idea about BOM and no interest to learn about) why the program is doing some strange manipulations all of a sudden. As I say both #1 and #2 are feasible, but I am looking for exactly what you mentioned in the second message, an option of the standard command, be it 'unicode', or 'utf8' or similar. And since I don't control the file creation, that option has to be smart to detect whether it is actually the BOM that is being cut out.

Thank you, Sergiy
Comment
Dave Airey

Join Date: Apr 2014

Posts: 398
#5

28 Apr 2014, 16:26

From section 1.1.1 in the R import export manual, it seems that file inspection may be necessary if you cannot contact the creator. I don't see any solution in import delimited. Solution #3 in your list could be to use R inside of Stata, as read.table() allows a fileEncoding= option (sec 2.1), but again this assumes you know the encoding.
Comment

Announcement

Strip BOM in import delimited

Comment

Comment

Comment

Comment