Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import all files from a folder

    A folder contains a lot of csv files with similar rows and columns. Without having to list all their names, how can I import them all into a single dta file? Many thanks.

  • #2
    You can use the macro function dir to get a list of all of the .csv files in a directory that you specify. Then you can loop over the macro list and import delimited each of them, append them to a growing temporary dataset (tempfile) and then save the appended datasets in a single permanent Stata dataset.
    Code:
    help macro##macro_fcn
    But "similar rows and columns" might not cut it, if you're planning to append them.

    There is also a user-written command on SSC that appends Excel worksheets from workbook files, and it might be able to do the same with delimited text files. I can't remember its name off the top of my head.

    Comment


    • #3
      I think you mean -xls2dta-, which imports Excel worksheets but not csv files.
      Your suggestion to loop over filenames contained in a macro is a good one. I struggled to get the filenames into a macro until I found this:
      https://stackoverflow.com/questions/...-at-once-stata

      Comment


      • #4
        You can try something like this where $location identifies the path of the csv data folder.
        cd "$location"
        local f: dir "$location" files "*.csv"
        foreach file of local f{
        import delimited `file'
        append using "`file'"
        }

        Comment


        • #5
          Thanks, Manish Srivastava. An elaborated version of your solution appears here: https://stackoverflow.com/questions/...-at-once-stata

          Comment


          • #6
            New problem: the loop slows down as the file being appended-to gets big. What to do about that? Appending everything to one big file doesn't scale well. Maybe append the smaller files to a number of medium sized files, and then append the medium sized files at the end. Right?
            Last edited by paulvonhippel; 24 Jan 2021, 10:05.

            Comment


            • #7
              Your idea with medium-sized files might solve the speed problem, yes. However, my first thought is that the slowness you describe may reflect Stata running into memory limitations and going into virtual memory. On that hypothesis, I would instead try importing each file and saving it to a temporary file, and then appending all those temporary files. Something like this (untested) is what I'm thinking of:
              Code:
              local appendlist = ""
              local i = 0
              foreach f of local YourCSVList {
                 import delimited `f' .... etc
                 local ++i
                 tempfile temp`i'
                 save `temp`i''
                 local appendlist "`appendlist' `temp`i''"
              }
              append using `appendlist'

              Comment

              Working...
              X