Hi all,
I have some extremly large txt files (up to 4+ GB) that I want to try to work with and have tried to use the user written chunky. It works perfectly for all files except those larger than ~ 2.1 GB. I have "analyzed" the files using both chunky (.chunky using *.txt, analyze) and hexdump (.hexdump *.txt, analyze results) and have found nothing out of the ordinary.
So, here is an example of an attempt to chunk a 3.7 GB file.
***start report***
Chunk fl0001.txt saved. Now at position 100,004,695
Chunk fl0002.txt saved. Now at position 200,006,031
Chunk fl0003.txt saved. Now at position 300,006,909
Chunk fl0004.txt saved. Now at position 400,007,952
Chunk fl0005.txt saved. Now at position 500,008,476
Chunk fl0006.txt saved. Now at position 600,008,731
Chunk fl0007.txt saved. Now at position 700,010,016
Chunk fl0008.txt saved. Now at position 800,010,463
Chunk fl0009.txt saved. Now at position 900,010,973
Chunk fl0010.txt saved. Now at position 1,000,012,845
Chunk fl0011.txt saved. Now at position 1,100,014,111
Chunk fl0012.txt saved. Now at position 1,200,015,470
Chunk fl0013.txt saved. Now at position 1,300,016,746
Chunk fl0014.txt saved. Now at position 1,400,018,081
Chunk fl0015.txt saved. Now at position 1,500,019,501
Chunk fl0016.txt saved. Now at position 1,600,020,438
Chunk fl0017.txt saved. Now at position 1,700,022,347
Chunk fl0018.txt saved. Now at position 1,800,023,511
Chunk fl0019.txt saved. Now at position 1,900,024,496
Chunk fl0020.txt saved. Now at position 2,000,026,303
Chunk fl0021.txt saved. Now at position 2,100,027,901
ftell(): 2094938649 Stata returned error
chunkfile(): - function returned error
<istmt>: - function returned error [1]
r(2094938649);
end of do-file
r(2094938649);
*** end report ****
I am running Stata MP (Dual Core) 12.1 (ahem, soon upgrading to 13) in Windows 7 - 64 bit.
I suspect this has something to do with memory (which is the exact reason I was using chunky) but I cannot dicern the problem/workaround. I could not find note of Stata error ftell(): 2094938649. FYI my memory settings are the follows (sorry for the poor tabbing):
. q memory
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory settings
set maxvar 32000 2048-32767; max. vars allowed
set matsize 10000 10-11000; max. # vars in models
set niceness 5 0-10
set min_memory 0 0-3200g
set max_memory . 64m-3200g or .
set segmentsize 64m 1m-32g
Finally as an FYI, here is an approximate version of the code:
****start ado *****
cd "B:/Folder A"
local files : dir . files "*.txt"
foreach f of local files {
cd "B:/FolderB //so that files are saved here
local newf = subinstr("`f'",".txt","",.) //extract the prefix
chunky using "B:/Folder A/`f'", chunksize(100 mb) header(include) stub(`newf') replace
cd "B:/Folder A" //back to the folder with the files to be chunked
}
*
****end ado*****
Any ideas of how to solve?
(FYI I have also emailed the Chunky author David Elliott directly.)
Thanks, as always, in advance,
Ben
I have some extremly large txt files (up to 4+ GB) that I want to try to work with and have tried to use the user written chunky. It works perfectly for all files except those larger than ~ 2.1 GB. I have "analyzed" the files using both chunky (.chunky using *.txt, analyze) and hexdump (.hexdump *.txt, analyze results) and have found nothing out of the ordinary.
So, here is an example of an attempt to chunk a 3.7 GB file.
***start report***
Chunk fl0001.txt saved. Now at position 100,004,695
Chunk fl0002.txt saved. Now at position 200,006,031
Chunk fl0003.txt saved. Now at position 300,006,909
Chunk fl0004.txt saved. Now at position 400,007,952
Chunk fl0005.txt saved. Now at position 500,008,476
Chunk fl0006.txt saved. Now at position 600,008,731
Chunk fl0007.txt saved. Now at position 700,010,016
Chunk fl0008.txt saved. Now at position 800,010,463
Chunk fl0009.txt saved. Now at position 900,010,973
Chunk fl0010.txt saved. Now at position 1,000,012,845
Chunk fl0011.txt saved. Now at position 1,100,014,111
Chunk fl0012.txt saved. Now at position 1,200,015,470
Chunk fl0013.txt saved. Now at position 1,300,016,746
Chunk fl0014.txt saved. Now at position 1,400,018,081
Chunk fl0015.txt saved. Now at position 1,500,019,501
Chunk fl0016.txt saved. Now at position 1,600,020,438
Chunk fl0017.txt saved. Now at position 1,700,022,347
Chunk fl0018.txt saved. Now at position 1,800,023,511
Chunk fl0019.txt saved. Now at position 1,900,024,496
Chunk fl0020.txt saved. Now at position 2,000,026,303
Chunk fl0021.txt saved. Now at position 2,100,027,901
ftell(): 2094938649 Stata returned error
chunkfile(): - function returned error
<istmt>: - function returned error [1]
r(2094938649);
end of do-file
r(2094938649);
*** end report ****
I am running Stata MP (Dual Core) 12.1 (ahem, soon upgrading to 13) in Windows 7 - 64 bit.
I suspect this has something to do with memory (which is the exact reason I was using chunky) but I cannot dicern the problem/workaround. I could not find note of Stata error ftell(): 2094938649. FYI my memory settings are the follows (sorry for the poor tabbing):
. q memory
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory settings
set maxvar 32000 2048-32767; max. vars allowed
set matsize 10000 10-11000; max. # vars in models
set niceness 5 0-10
set min_memory 0 0-3200g
set max_memory . 64m-3200g or .
set segmentsize 64m 1m-32g
Finally as an FYI, here is an approximate version of the code:
****start ado *****
cd "B:/Folder A"
local files : dir . files "*.txt"
foreach f of local files {
cd "B:/FolderB //so that files are saved here
local newf = subinstr("`f'",".txt","",.) //extract the prefix
chunky using "B:/Folder A/`f'", chunksize(100 mb) header(include) stub(`newf') replace
cd "B:/Folder A" //back to the folder with the files to be chunked
}
*
****end ado*****
Any ideas of how to solve?
(FYI I have also emailed the Chunky author David Elliott directly.)
Thanks, as always, in advance,
Ben
Comment