Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Forcing loop through error when importing folder of excel files

    I have a folder with thousands of excel files containing municipal budget data. I am attempting to loop through each file, import specific cells of data, and merge them all into one single file. The code I have does this, unless there is an error in any of the excel files (e.g. missing the sheet of data I am importing). I would like to skip over the excel file if there is any error and continue with the import. I've tried using capture noisily, but it does not work. Any help is appreciated!

    cd "C:\Data\Budgets"
    clear
    local allfiles: dir . files "*.xls"

    foreach file of local allfiles {
    preserve
    import excel using `file', sheet("Budget Data") cellrange(A1:R9) clear

    save temp, replace
    restore
    append using temp, force
    }

    -Mike

  • #2
    Please help us help you. Show the code you ran using capture noisily that "does not work". Show us what Stata told you when you ran it. Tell us what precisely is wrong. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

    Comment


    • #3
      Additionally, it's hard to tell from the question what the problem is. Why are you capturing the errors? Doesn't that tell you something's off? You don't wanna to import 1000s of spreadsheets and append them based on errors- you'll only cause yourself more suffering down the line, even if it does "work".

      Additionally, you didn't declare a tempfile. So, you'll just keep overwriting the new file.

      Comment


      • #4
        Here is the full code, including the capture noisily command, with the errors:

        cd "C:\Data\Budget"
        clear
        local allfiles: dir . files "*.xls"

        foreach file of local allfiles {
        preserve
        capture noisily import excel using `file', sheet("UFB-7 Personnel Costs") cellrange(C10:S10) clear
        if _rc == 0 {
        save temp, replace
        restore
        append using temp, force
        }
        }

        It successfully imports dozens of files before I get the following error:

        Error:
        file temp.dta saved
        worksheet UFB-7 Personnel Costs not found
        already preserved
        r(621);

        This is a budget that is missing the worksheet I want imported. I'd like to skip over this file and continue with the next file in the directory.

        Comment


        • #5
          Code:
          cd "C:\Data\Budget"
          clear
          local allfiles: dir . files "*.xls"
          
          foreach file of local allfiles {
              preserve
              capture noisily import excel using `file', sheet("UFB-7 Personnel Costs") cellrange(C10:S10) clear
              if _rc == 0 {
                  save temp, replace
                  restore
                  append using temp, force
              }
              else {
                  display "problem in `file'"
                  restore
              }
          }
          So we see that capture noisily worked correctly, but having once preserved a dataset, it cannot again be preserved without first having been restored, even though nothing had changed. This was indicated by the "already preserved" error message.
          Last edited by William Lisowski; 18 Mar 2022, 13:52.

          Comment


          • #6
            William. Thank you! That makes sense.

            Comment


            • #7
              It is hard for me to resist jumping on one of my hobby horses. Why are you using the -force- option with -append-? This is almost certainly a bad idea, another way to create an incorrect data set that will eventually come back to bite you, or bite somebody else who relies on your incorrect data and subsequent analyses.

              -append- throws error messages when there are incompatibilities between the data sets being appended. Typically it is a situation where one data set has a certain variable as numeric, and the other has it as string. The -force- option deals with this situation by discarding the incompatible data. So you have solved no problem. You have simply ignored the error condition that Stata was kind enough to call to your attention. Now, if somehow you know that this kind of incompatibility only arises in variables that you don't need anyway, I suppose that would be no harm, no foul. But in that case, the better way to do it is to -drop- those unneeded variables before you -append- anyway. That way, if you're wrong about which variables have incompatibilities, you will still get your error message and aborted process. Another thing you can do is, rather than a single program importing the spreadsheets and appending them at once, is to read in each spreadsheet and save it in a separate file. Then use the -precombine- program, by Mark Chatfield, available from Stata Journal, to identify all the potential problems in putting these data sets together. Then you can fix those problems so you are working with correct data. (Or if -precombine- tells you the only problematic variables are ones you don't need anyway, you can -drop- them in the subsequent program that -append-s everything together.

              Remember: error messages and program terminations are your friends. They are there to rescue you from the pitfalls of incorrect data. You ignore them at your peril, and the peril of those who rely on your work. There are very few situations where the use of -force- options is appropriate: if you think yours is one of them, think twice, and more times before proceeding. Similarly, while there is a role for -capture-, it should be used cautiously, and you should always in that code verify that the error-code thrown is one that was anticipated and does not represent a real problem.

              Comment

              Working...
              X