The following code (Stata 13.1) counts instances of unique words by line in the file 'bible.txt'. However, the code runs too slowly, repeating words previously searched for which have to be skipped to speed up the search. Ideally, I would like to have a list of all the words searched for to be listed in the file 'bibleout.txt' with each unique word appearing only once. The search should skip common words like 'is, was, the, there...etc.' and these are listed in the file 'words.txt'. Any idea how to make make the code more efficient, compact and to skip words previously encountered? Example data are given below the program.
capture file close myfile
cd "c:\Users\username\location1"
file open myfile using "bible.txt", read
file read myfile line
local r = r(eof)
while `r' == 0 {
local x : word count `line'
disp _n (2) "`line'"
local line : list uniq line
qui import delimited bibleout.txt, clear
foreach w of local line {
cap assert strpos(v1, trim(itrim(`w'))) == 0 & ///
strpos(v1, trim(itrim("`w'"))) == 0
if _rc {
local line : list line - w
continue
}
}
local x : word count `line'
tokenize `"`line'"'
local s1 (:|,|\.|\;|[0-9])$
qui import delimited word.txt, clear
forval word = 3/`x' {
cap assert strpos(v1, trim(itrim("``word''"))) == 0
if !_rc & !regexm("``word''","`s1'") {
capture file close myfile2
capture file close myline
file open myfile2 using "bible.txt", read
file open myline using "bibleout.txt", write append
file read myfile2 line
scalar k = 0
while r(eof)==0 {
local i length("`line'")
local p length(subinstr("`line'", "``word''", "", .))
local n length("``word''")
scalar j = (`i' - `p')/(`n')
scalar k = k + j
local k = k
file read myfile2 line
}
disp "The word ``word'' appears `k' times!"
file write myline "``word''" _n
continue
}
}
local --r
file read myfile line
local r = r(eof)
continue
}
Data Files are given below:
1. File bible.txt is the source file, shortened here for space
Genesis 1:1 In the beginning God created the heaven and the earth.
Genesis 1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
Genesis 1:3 And God said, Let there be light: and there was light.
Genesis 1:4 And God saw the light, that it was good: and God divided the light from the darkness.
Genesis 1:5 And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.
2. File bibleout.txt is a list of unique words already searched for, implying that this file grows with more searches
face
Spirit
divided
darkness
evening
first
firmament
divide
evening
face
3. File word.txt is a list of common words to be skipped in the search
is
when
in
the
there
were
he
from
which
under
above
capture file close myfile
cd "c:\Users\username\location1"
file open myfile using "bible.txt", read
file read myfile line
local r = r(eof)
while `r' == 0 {
local x : word count `line'
disp _n (2) "`line'"
local line : list uniq line
qui import delimited bibleout.txt, clear
foreach w of local line {
cap assert strpos(v1, trim(itrim(`w'))) == 0 & ///
strpos(v1, trim(itrim("`w'"))) == 0
if _rc {
local line : list line - w
continue
}
}
local x : word count `line'
tokenize `"`line'"'
local s1 (:|,|\.|\;|[0-9])$
qui import delimited word.txt, clear
forval word = 3/`x' {
cap assert strpos(v1, trim(itrim("``word''"))) == 0
if !_rc & !regexm("``word''","`s1'") {
capture file close myfile2
capture file close myline
file open myfile2 using "bible.txt", read
file open myline using "bibleout.txt", write append
file read myfile2 line
scalar k = 0
while r(eof)==0 {
local i length("`line'")
local p length(subinstr("`line'", "``word''", "", .))
local n length("``word''")
scalar j = (`i' - `p')/(`n')
scalar k = k + j
local k = k
file read myfile2 line
}
disp "The word ``word'' appears `k' times!"
file write myline "``word''" _n
continue
}
}
local --r
file read myfile line
local r = r(eof)
continue
}
Data Files are given below:
1. File bible.txt is the source file, shortened here for space
Genesis 1:1 In the beginning God created the heaven and the earth.
Genesis 1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
Genesis 1:3 And God said, Let there be light: and there was light.
Genesis 1:4 And God saw the light, that it was good: and God divided the light from the darkness.
Genesis 1:5 And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.
2. File bibleout.txt is a list of unique words already searched for, implying that this file grows with more searches
face
Spirit
divided
darkness
evening
first
firmament
divide
evening
face
3. File word.txt is a list of common words to be skipped in the search
is
when
in
the
there
were
he
from
which
under
above
Comment