importing of data using a text line as marker of column names row

Marie Baekvad-Hansen

Join Date: Jan 2020

Posts: 2
#1

importing of data using a text line as marker of column names row

26 Feb 2020, 14:36

Dear Statlist
I have output data from two different instruments, that I need to import to stata. They are in the same format, but one instrument output has 40 rows of text prior to the actual table data and the other has 42 rows of text prior to the table data. Both outputs have the line " [Results] " before the column names.

* Signal Smoothing On = true
* Stage/ Cycle where Analysis is performed = Stage 2, Step 2
* User Name = CBV/KATL
[Results]
Well Well Position
1 A1 false
2 A2 false

Can I import the files specifying that the data read should begin after the line [Results] ? and that the line after [Results] is the column names

Thanks
Marie
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

26 Feb 2020, 14:46

Welcome to Statalist.

You will want to use the import delimited command and use the rowrange option to specify that your data start on row 42 and the varnames option to specify that variable names being on row 41. Here is an example using your data from post #1.

Code:

. type ~/Downloads/sample.txt
* Signal Smoothing On = true
* Stage/ Cycle where Analysis is performed = Stage 2, Step 2
* User Name = CBV/KATL
[Results]
Well_a Well_b Position
1 A1 false
2 A2 false

. import delimited ~/Downloads/sample.txt, rowrange(6) varnames(5) delimiters(" ")
(3 vars, 2 obs)

. list

     +----------------------------+
     | well_a   well_b   position |
     |----------------------------|
  1. |      1       A1      false |
  2. |      2       A2      false |
     +----------------------------+

The output of help import delimited will give you more information about the import delimited command.

Comment

Marie Baekvad-Hansen

Join Date: Jan 2020

Posts: 2
#3

27 Feb 2020, 00:07

Thank you for the answer.

My issue is, that I run a script, that imports and append all files of the given type.

!dir "*Sample*.txt" /a-d/b >filelist.txt

drop _all
file open lister using filelist.txt, read
file read lister line
import delimited "`line'", varnames(1)
gen plate="`line'"
save sample_data.dta, replace

drop _all

file read lister line
while r(eof)==0 {
import delimited "`line'", varnames(1)
gen plate="`line'"
append using sample_data.dta, force
save sample_data.dta, replace
drop _all
file read lister line
}

file close lister

As the rowrange and varname line differ between the outputfiles and will be 42 for half the files and 40 for the other half I am not able to use the rowrange(line number)/ varnames(linenumber) command. Is there a way to define varnames as row below the [Results] line ?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#4

27 Feb 2020, 04:33

As the rowrange and varname line differ between the outputfiles and will be 42 for half the files and 40 for the other half I am not able to use the rowrange(line number)/ varnames(linenumber) command. Is there a way to define varnames as row below the [Results] line ?

You need to be able to somehow read the data before the import. The following uses infix and is not tested.

Code:

infix str data 1-200000 using filelist.txt gen line= regexm(data,"[Results]") local line= cond(line==1 & _n==40, 41, 43) clear import delimited filelist.txt, rowrange(`=`line'+1') varnames(`=`line'') delimiters(" ")
1 like
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 784
#5

27 Feb 2020, 09:15

Is there a way to define varnames as row below the [Results] line?

the following will strip off the first lines, save the remaining lines to a temporary file to be imported

Code:

local fn = "sample.txt" tempfile s qui di filewrite("`s'", ustrregexrf(fileread("`fn'"), "(?s).*\[Results\]", "" )) import delimited "`s'" , delim(" ") varnames(1)
1 like
Comment

Announcement

importing of data using a text line as marker of column names row

Comment

Comment

Comment

Comment