Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping through folders first, then looping through files

    Hi

    I am using Stata 15.0. I am creating code that will automatically clean and create variables from four different types of Excel sheets that will be saved in four different folders (Folder1- Folder4). I do not know how many Excel sheets will be saved into the different folders at this stage. I know how to loop through the files in each folder, import the files and save them as Stata files, but when I try and create the outer loop (foreach folder in Folder1 Folder2 Folder3 Folder {) I run into problems as Stata shows the error below, and it doesn't seem to be reading `folder' correctly.

    Error:
    directory C:\Users\Input\Folders\`folder'\ not found

    My code:
    Code:
    global input "C:\Users\Input\Folders"
    *Loop across the four folders: Folder1 Folder2 Folder3 Folder 4
    foreach folder in Folder1 Folder2 Folder3 Folder 4{
        * Set so can loop over all files in that folder
        local files : dir "$input\`folder'\" files "*.xlsx"
        cd "$input\`folder'\"
        * Import from Excel and save as a datafile
        foreach file in `files' {
            * Import
            import excel using "`file'", clear
            save "$temp\`file'.dta", replace
        }
    }
    If I change the 5th line of code to be what I've written below, then I receive the following error suggesting that my `folder' is being read correctly but the apostrophes remain.
    Error:
    directory C:\Users\Input\Folders\`Folder1'\ not found

    Code:
        local files : dir "$input\``folder''\" files "*.xlsx"
    Where am I going wrong?

    Thanks!
    Megan

  • #2
    When the backslash (\) character appears immediately before the left-quote character (`), the latter is interpreted as a literal left-quote character, not as the beginning of a local macro reference. The solution is to use the forward slash (/) as the path separator. Even if you are running Windows, this will still work properly because Stata will recognize that you are specifying a path name and will, before passing the commands to Windows, replace the / with the \ that Windows expects.

    That said, you can simplify this code by using a couple of user-written commands. Check out -filelist-, by Robert Picard, and available from SSC. It will handle the recursion through the folders for you and build a complete list of the files in all of the folders. Then you can use that with -runby-, by Robert Picard and me, to loop over the files.

    So your code would look something like this:

    Code:
    clear
    filelist, pattern(*.xlsx) directory("$input") replace
    
    capture program drop one_file
    program define one_file
        local dirname = dirname[1]
        local filename = filename[1]
        import excel using `"`dirname'/`filename'"', clear
        local filename: subinstr local filename ".xlsx" ".dta"
        save `"$temp/`filename'"', replace
        exit
    end
    
    runby one_file, by(dirname filename) status
    Note: The above code is not tested, so beware of unbalanced quotes, brackets, etc. To debug, add the -verbose- option to the -runby- command so that you will see any error messages generated by program one_file.



    Comment


    • #3
      Speaking of community-contributed commands, using xls2dta and filelist (both SSC), it seems the complete thing could be as short as

      Code:
      xls2dta , save("$temp") recursive : import excel "C:/Users/Input/Folders"
      Best
      Daniel

      Comment


      • #4
        Yes, apologies to Daniel Klein for not mentioning his approach. I suppose I'm partial to -runby- because I am one of its authors. Also, in my practice, it is rare that I have the luxury of directly importing data from Excel to Stata without some substantial transformations or cleaning along the way. So I have not been an -xls2dta- user myself. But when the task really is to just mass import some Excel files, without further modifications, into Stata, -xls2dta- is the right tool for the job.

        Comment


        • #5
          Conversely, if more is to be done to the excel files, the runby approach in #2 is more flexible.

          Best
          Daniel

          Comment


          • #6
            Hi

            Thank you SO much! This was super informative and went beyond what I had even asked. Thanks to both of you.

            All the best
            Megan

            Comment

            Working...
            X