Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strip BOM in import delimited

    Hello Statalisters, is there any simple option to strip the BOM from a CSV before importing the data? I can certainly do this with:
    1. the direct file read/write operations before calling import delimited, or
    2. trimming the label of the first variable after calling import delimited,
    but I am looking for a solution involving only supplying options to import delimited.

    Example file to reproduce the problem.

    Illustration of the problem:

    Thank you, Sergiy Radyakin
    Attached Files

  • #2
    So this only affects your first variable label? Why not just fix your first variable label when you don't have control over file creation?

    Comment


    • #3
      I read about this just now at <http://cran.r-project.org/doc/manuals/r-patched/R-data.pdf>, by searching BOM in the PDF. I suppose import delimited should have a file encoding option added, if it is not there already.

      Comment


      • #4
        Originally posted by Dave Airey View Post
        So this only affects your first variable label? Why not just fix your first variable label when you don't have control over file creation?
        Yes, Dave, I can do this, but I need to explain to other people (who have no idea about BOM and no interest to learn about) why the program is doing some strange manipulations all of a sudden. As I say both #1 and #2 are feasible, but I am looking for exactly what you mentioned in the second message, an option of the standard command, be it 'unicode', or 'utf8' or similar. And since I don't control the file creation, that option has to be smart to detect whether it is actually the BOM that is being cut out.

        Thank you, Sergiy

        Comment


        • #5
          From section 1.1.1 in the R import export manual, it seems that file inspection may be necessary if you cannot contact the creator. I don't see any solution in import delimited. Solution #3 in your list could be to use R inside of Stata, as read.table() allows a fileEncoding= option (sec 2.1), but again this assumes you know the encoding.

          Comment

          Working...
          X