Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Garbled characters appear when importing with the infix command

    Click image for larger version

Name:	Garbled characters.png
Views:	1
Size:	7.7 KB
ID:	1623960

    When using the infix command to import the aa.txt file in the attachment, there are garbled characters as shown in the figure, why only the first line appears, the second line and other subsequent lines are normal, please what is the reason for the garbled characters What can be used to solve the problem?

    infix str2 keys 1-2 str244 contents 4-244 using aa.txt,clear // the used command line
    Click image for larger version

Name:	Enlarged garbled character.png
Views:	1
Size:	6.4 KB
ID:	1623962
    Attached Files

  • #2
    I do not download files from strangers, so I have not looked at your attachment. Nevertheless, I have a theory here. The "garbled character" you show here is how Stata often displays non-ASCII characters. There may be a non-ASCII character in the data set. Actually, in the help file, it says that -infix- is only guaranteed to import text correctly if it is ASCII or UFT-8. So that would tend to support my theory. You might want to do a hex dump (-help hexdump-) of aa.txt to see what else is in there.

    Comment


    • #3
      As Clyde alludes, your text file has a so-called byte-order mark (BOM) as the first three bytes, and that is rendered as the strange character when importing via -infile-.

      You can open your file in, say, Windows Notepad (recent vintage) and Save as selecting UTF-8 in the Encoding drop-down menu instead of UTF-8 with BOM. (I've attached your file after having done that, renaming it aa1.txt.)

      .ÿinfixÿstr2ÿkeysÿ1-2ÿstr244ÿcontentsÿ4-244ÿusingÿaa1.txt,ÿclear
      (3ÿobservationsÿread)

      .ÿlist,ÿnoobs

      ÿÿ+------------------------+
      ÿÿ|ÿkeysÿÿÿÿÿÿÿÿÿÿcontentsÿ|
      ÿÿ|------------------------|
      ÿÿ|ÿÿÿ%0ÿÿÿJournalÿArticleÿ|
      ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿ|ÿÿÿ%0ÿÿÿJournalÿArticleÿ|
      ÿÿ+------------------------+

      .


      You can Google for other ways to remove the BOM at the beginning of your text file.
      Attached Files

      Comment


      • #4
        I should post the content in the attachment to avoid the security risk caused by downloading. The contents in aa.txt are as follows:

        %0 Journal Article

        %0 Journal Article

        Comment


        • #5
          thank you very much, i got the reason and method

          keys contents
          %0 Journal Article

          %0 Journal Article

          Comment

          Working...
          X