Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • delimiters not recognized by stata when importing txt

    Dear statalist,

    I have a txt file which I would like to import in stata. The delimiter I use is "§". When I upload the file the data looks fine in the preview screen (see image). But when I actually import the data, all the values for all the variables in the txt are placed under one single variable with § staying between each value.


    One potential reason I could think of is because stata doesn't recognize "§" as a valid delimiter, but intuitively it should then also not be recognized in the preview screen I guess.

    The output I get in the results window is:
    Code:
    import delimited D:\users\gianni.spolverato\Desktop\Results_GS_delimiter.txt, delimiter("§") clear 
    (1 var, 1206758 obs)
    Could anyone help me further?

    Thank you in advance for your time.

    Click image for larger version

Name:	stata_example.png
Views:	1
Size:	380.6 KB
ID:	1525599


  • #2
    I don't understand the question as your screenshot seems to show data being parsed very nicely. There's work to do on your date variable and exchange_something needs a destring, but I can't sense a problem with delimiters.

    Comment


    • #3
      Thank you for your answer and my apologies for not being clear enough. Indeed, the data looks nicely parsed on the screenshot. However, this is only the preview screen. When I actually confirm the import, my data looks nothing like the preview screen. Everything is pasted together under one variable and as you can see in the example below, the program does not recognize the § as a delimiter. It does recognize the delimiter in the preview screen, but not in the actual data after being imported. Below I provide a screenshot of the browse window:
      Click image for larger version

Name:	stataex.png
Views:	1
Size:	187.9 KB
ID:	1525620

      Comment


      • #4
        Here are some examples that reproduce Gianni's problem and demonstrate its origin. (Note that he said that it looked OK in the preview, which I didn't check.)

        Per the examples below, apparently -import delimited- will not accept delimiters above char(127) (I'm using v. 15.1, plain Ascii; v. 16 might be different.) I can't find anything in the documentation that describes this limitation, so perhaps that's some kind of bug. My solution would be to use -filefilter- to change the delimiter.
        Code:
        // Make and import example files with various delimiters
        local md = char(167) // "§" per Gianni's file.
        tempfile temp
        sysuse auto, clear
        export delimited using "`temp'", delimiter("`md'")
        clear
        import delimited using "`temp'", delimiter("`md'") varnames(1)
        browse // not OK
        //
        // Lower Ascii
        clear
        local md = char(127)  
        tempfile temp
        sysuse auto, clear
        export delimited using "`temp'", delimiter("`md'")
        clear
        import delimited using "`temp'", delimiter("`md'") varnames(1)
        browse  // OK
        //
        // Upper Ascii
        clear
        local md = char(128)  
        tempfile temp
        sysuse auto, clear
        export delimited using "`temp'", delimiter("`md'")
        clear
        import delimited using "`temp'", delimiter("`md'") varnames(1)
        browse // not OK
        //
         // Solution: Filter problem delimiter to new delimiter
        local md = char(128)
        tempfile temp
        sysuse auto, clear
        export delimited using "`temp'", delimiter("`md'")
        local newmd = ","
        tempfile temp2
        filefilter "`temp'" "`temp2'", from("`md'") to("`newmd'")
        clear
        import delimited using "`temp2'", delimiter("`newmd'") varnames(1)
        browse // OK

        Comment


        • #5
          The delimiter must be in UTF-8 encoding even if the source file is not. The following should work for Mike Lacy's example

          Code:
          local md = char(167) // "§" per Gianni's file.
          tempfile temp
          sysuse auto, clear
          export delimited using "`temp'", delimiter("`md'")
          clear
          
          // note "§" is the same character in UTF-8 encoding as char(167) in Latin-1 encoding
            
          import delimited using "`temp'", delimiter("§") varnames(1) encoding("latin1")
          browse
          To obtain the character in UTF-8 encoding from Latin-1 encoding:

          Code:
           di ustrfrom(char(167), "latin1", 1)
          then you may copy/paste the displayed character. If you want the byte sequence of the UTF-8 encoding:

          Code:
          di tobytes(ustrfrom(char(167), "latin1", 1))
          di char(194)+char(167)
          Last edited by Hua Peng (StataCorp); 20 Nov 2019, 13:35.

          Comment


          • #6
            I took a csv file, replaced the comma delimiter with §, and imported it into Stata (version 15.1) without any problems.
            import delimited aaa.txt, delimiter("§")
            I checked with browse, describe and sum to be sure that I had exactly the same data after importing the text file.
            On Edit after reading Hua Peng's post: I used Notepad to replace the comma with the paragraph symbol
            Last edited by Eric de Souza; 20 Nov 2019, 13:46.

            Comment


            • #7
              Thank you all. I've changed the delimiter to ";" using the suggested codes and now it works fine (in stata 13).

              Comment

              Working...
              X