delimiters not recognized by stata when importing txt

Gianni Spolver

Join Date: May 2019

Posts: 25
#1

delimiters not recognized by stata when importing txt

20 Nov 2019, 11:18

Dear statalist,

I have a txt file which I would like to import in stata. The delimiter I use is "§". When I upload the file the data looks fine in the preview screen (see image). But when I actually import the data, all the values for all the variables in the txt are placed under one single variable with § staying between each value.

One potential reason I could think of is because stata doesn't recognize "§" as a valid delimiter, but intuitively it should then also not be recognized in the preview screen I guess.

The output I get in the results window is:

Code:

import delimited D:\users\gianni.spolverato\Desktop\Results_GS_delimiter.txt, delimiter("§") clear (1 var, 1206758 obs)

Could anyone help me further?

Thank you in advance for your time.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35211
#2

20 Nov 2019, 11:31

I don't understand the question as your screenshot seems to show data being parsed very nicely. There's work to do on your date variable and exchange_something needs a destring, but I can't sense a problem with delimiters.
Comment
Gianni Spolver

Join Date: May 2019

Posts: 25
#3

20 Nov 2019, 12:57

Thank you for your answer and my apologies for not being clear enough. Indeed, the data looks nicely parsed on the screenshot. However, this is only the preview screen. When I actually confirm the import, my data looks nothing like the preview screen. Everything is pasted together under one variable and as you can see in the example below, the program does not recognize the § as a delimiter. It does recognize the delimiter in the preview screen, but not in the actual data after being imported. Below I provide a screenshot of the browse window:
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2396

20 Nov 2019, 13:04

Here are some examples that reproduce Gianni's problem and demonstrate its origin. (Note that he said that it looked OK in the preview, which I didn't check.)

Per the examples below, apparently -import delimited- will not accept delimiters above char(127) (I'm using v. 15.1, plain Ascii; v. 16 might be different.) I can't find anything in the documentation that describes this limitation, so perhaps that's some kind of bug. My solution would be to use -filefilter- to change the delimiter.

Code:

// Make and import example files with various delimiters
local md = char(167) // "§" per Gianni's file.
tempfile temp
sysuse auto, clear
export delimited using "`temp'", delimiter("`md'")
clear
import delimited using "`temp'", delimiter("`md'") varnames(1)
browse // not OK
//
// Lower Ascii
clear
local md = char(127)  
tempfile temp
sysuse auto, clear
export delimited using "`temp'", delimiter("`md'")
clear
import delimited using "`temp'", delimiter("`md'") varnames(1)
browse  // OK
//
// Upper Ascii
clear
local md = char(128)  
tempfile temp
sysuse auto, clear
export delimited using "`temp'", delimiter("`md'")
clear
import delimited using "`temp'", delimiter("`md'") varnames(1)
browse // not OK
//
 // Solution: Filter problem delimiter to new delimiter
local md = char(128)
tempfile temp
sysuse auto, clear
export delimited using "`temp'", delimiter("`md'")
local newmd = ","
tempfile temp2
filefilter "`temp'" "`temp2'", from("`md'") to("`newmd'")
clear
import delimited using "`temp2'", delimiter("`newmd'") varnames(1)
browse // OK

Comment

Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 333
#5

20 Nov 2019, 13:30

The delimiter must be in UTF-8 encoding even if the source file is not. The following should work for Mike Lacy's example

Code:

local md = char(167) // "§" per Gianni's file. tempfile temp sysuse auto, clear export delimited using "`temp'", delimiter("`md'") clear // note "§" is the same character in UTF-8 encoding as char(167) in Latin-1 encoding import delimited using "`temp'", delimiter("§") varnames(1) encoding("latin1") browse

To obtain the character in UTF-8 encoding from Latin-1 encoding:

Code:

di ustrfrom(char(167), "latin1", 1)

then you may copy/paste the displayed character. If you want the byte sequence of the UTF-8 encoding:

Code:

di tobytes(ustrfrom(char(167), "latin1", 1)) di char(194)+char(167)

Last edited by Hua Peng (StataCorp); 20 Nov 2019, 13:35.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#6

20 Nov 2019, 13:36

I took a csv file, replaced the comma delimiter with §, and imported it into Stata (version 15.1) without any problems.
import delimited aaa.txt, delimiter("§")
I checked with browse, describe and sum to be sure that I had exactly the same data after importing the text file.
On Edit after reading Hua Peng's post: I used Notepad to replace the comma with the paragraph symbol

Last edited by Eric de Souza; 20 Nov 2019, 13:46.
Comment
Gianni Spolver

Join Date: May 2019

Posts: 25
#7

27 Nov 2019, 10:51

Thank you all. I've changed the delimiter to ";" using the suggested codes and now it works fine (in stata 13).
Comment

Announcement