Hello everyone!
I am trying to import close to 3 millions small (20-30kb) .CSV files into Stata.
I am using the "dir" command as below
This allows me to import around 5500 files before it "breaks"/ends due to the restrictions of 65,536 unique string values.
- If I do not use the "nofail" command it yields the following error code: "too many filenames r(134);"
The files I am trying to import has two-fold names like "29392072_c18e309843354931953bb5639545b487" where
Any suggestion on how to solve this problem or alternative approaches are very welcome - thanks!
EDIT: Each files is a financial statement for a Danish company containing both strings and numeric values. I first of all need to import the statements, extract the relevant information, transform to panel data and then finally append.
Best regards
Emil
I am trying to import close to 3 millions small (20-30kb) .CSV files into Stata.
I am using the "dir" command as below
Code:
local files : dir "C:/Users/Emil/Documents/master/rawdata/output" files "*.csv", nofail foreach file in `files' { import delimited `file', delimiter(";") bindquote(strict) varnames(1) maxquotedrows(10000) stringcols(_all) clear *** other commands *** }
- If I do not use the "nofail" command it yields the following error code: "too many filenames r(134);"
The files I am trying to import has two-fold names like "29392072_c18e309843354931953bb5639545b487" where
- "29392072" is an semi-unique identification number (there may be up to 8 files per identification number)
- "c18e309843354931953bb5639545b487" is a random number
Any suggestion on how to solve this problem or alternative approaches are very welcome - thanks!
EDIT: Each files is a financial statement for a Danish company containing both strings and numeric values. I first of all need to import the statements, extract the relevant information, transform to panel data and then finally append.
Best regards
Emil
Comment