Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to loop over all tabs of as excel file?

    Hi,

    Does anyone knows how to loop over all tabs of as excel file? Can Stata read/detect the number of Excel tabs?

    In the loop, before moving to the next tab, I would like to save the dataset as "NameOfTab.dta", replace. Thanks a lot for your help!


  • #2
    See xls2dta (SSC).

    Best
    Daniel

    Comment


    • #3
      xls2dta will allow me to save all sheets in my directory. Now how do I loop over them to execute the same code on each of them? Thanks!

      Comment


      • #4
        See the help file. This is explained in a simple example. For a more specific answer please ask a more specific question.

        Best
        Daniel

        Comment


        • #5
          Hi Daniel,

          I have read the help file (and see that you're the author of "xls2dta", thanks for that). I need to do sequentially (N being the total number of excel tabs):
          for i = 1 to N {
          1) Import tab `i' from Excel file
          2) Run some Stata code I already have (same for all tabs)
          3) Save the dataset `i'

          Afterward, I'd like to merge all of them by date. Can you help me with that? Thanks!

          Comment


          • #6
            Sticking with xls2dta you would

            1) Import and save the worksheets

            Code:
            xls2dta , allsheets : import excel mydrive:/mydirectory/myfile.xlsx
            2) Execute the commands. If you have only a few you could

            Code:
            xls2dta : xeq command1 ; command2 ; ...
            If you have more than a few commands, and/or more complex code, e.g. loops, then you would save the commands in a do-file, say, mydo.do. Then,

            Code:
            xls2dta : do mydo.do
            3) merge the datasets (assuming a 1:1 merge with date as the key-variable)

            Code:
            xls2dta merge 1:1 date
            Depending on the commands you wish to run, you may be better off merging first, then apply the commands once on the combined dataset.

            Best
            Daniel

            Comment


            • #7
              Hi Francois,

              It should also be noted that the import excel command can detect the number of Excel sheets by using the describe option. The following code should approximate what you're trying to do:

              Code:
              * describe sheets
              import excel using eg.xlsx, describe
              return list
              
              * loop through all sheets, list data, then save
              forvalues i=1/`r(N_worksheet)' {
                  import excel using eg.xlsx, sheet(`"`r(worksheet_`i')'"') firstrow clear
                  list, noobs
                  save `"`r(worksheet_`i')'"', replace
              }

              Comment


              • #8
                Daniel: Thanks for your help. in step 3), does "Xls2dta" support "1:1" in "merge"? I'm getting the following error:

                Code:
                . xls2dta merge 1:1 date
                command 1 is not supported by xls2dta
                I have removed 1:1. Should I be worried about the output?



                Aaron: Thank you so much for your help, your code is very clear and concise. I will allow me to double check Daniel's method output!

                Comment


                • #9
                  Also, Daniel: I was looking at the "convert options" in the help file:

                  convert options Description
                  -----------------------------------------------------------------------------------------------------------------------------------------------------
                  save([directory] [, save_options]) save converted Excel files in directory
                  sheets(sheetsspec [, not]) convert worksheets sheetsspec
                  allsheets convert all worksheets
                  recursive[(maxdeep)] search filename recursively
                  generate(newvar1 [newvar2]) create variables with filename, sheetname
                  respectcase respect case in filename (Windows only)
                  nostop continue if import excel fails


                  How can I use them to name each file I save "ExcelName_SheetName.dta". That would help me when I merge back. Thanks!

                  Comment


                  • #10
                    Originally posted by Francois Durant View Post
                    in step 3), does "Xls2dta" support "1:1" in "merge"?
                    I have made a syntax error there, sorry. Should be

                    Code:
                    xls2dta : merge 1:1 ...
                    Note the colon after xls2dta.

                    Originally posted by Francois Durant View Post
                    I was looking at the "convert options" in the help file: [...] How can I use them to name each file I save "ExcelName_SheetName.dta".
                    You cannot do this (easily). You can specify the generate()option and save the names of the Excel file and worksheets into variables in the individual dta files to be able to track the sources of the observations. I do not know what you mean by "merge back" and how different file names would make this easier. The main point of xls2dta is that you do not need to worry about filenames and the like because the program keeps track of things for you.

                    Concerning Aaron's approach: this is basically what xls2dta does internally. Writing the loop yourself can be the better way. Note, however, that Aaron's code will fail if one of the commands inside the loop wipes out the r() results left behind by import excel , describe. There are, obviously, ways to solve this. The code will also mess things up if it fails for one of the files. Then you will have half the files converted and the other half still in Excel format. Again, you can rerun and sort things out. The advantage of using xls2dta is that the program handles all these details for you with a (hopefully) convenient syntax.

                    Best
                    Daniel
                    Last edited by daniel klein; 30 Nov 2017, 03:18.

                    Comment


                    • #11
                      Thanks a lot for clarification daniel!

                      Comment


                      • #12
                        Originally posted by Aaron McMillan View Post
                        Hi Francois,

                        It should also be noted that the import excel command can detect the number of Excel sheets by using the describe option. The following code should approximate what you're trying to do:

                        Code:
                        * describe sheets
                        import excel using eg.xlsx, describe
                        return list
                        
                        * loop through all sheets, list data, then save
                        forvalues i=1/`r(N_worksheet)' {
                        import excel using eg.xlsx, sheet(`"`r(worksheet_`i')'"') firstrow clear
                        list, noobs
                        save `"`r(worksheet_`i')'"', replace
                        }

                        I have tried this syntax as well. In the loop, I destring some variables and I get the following error, can someone explain why?
                        Code:
                        rename BD Venezuela
                        
                        invalid file specification
                        I believe this happens when Stata tries to save the file but can't be sure because of the loop. Thanks for your help!

                        Comment


                        • #13
                          Originally posted by Francois Durant View Post
                          I have tried this syntax as well. In the loop, I destring some variables and I get the following error, can someone explain why?
                          [... ]
                          I believe this happens when Stata tries to save the file but can't be sure because of the loop. Thanks for your help!
                          I have explained this above. destring, although not an r-class program, calls r-class programs internally and, thus, wipes out whatever was left in r(worksheet_`i') before. Perhaps the error occurs with save, perhaps it occurs with import excel the second time through the loop. Type

                          Code:
                          set more on
                          before execution to see what is happening.

                          Best
                          Daniel

                          Comment


                          • #14
                            Originally posted by daniel klein View Post

                            Concerning Aaron's approach: this is basically what xls2dta does internally. Writing the loop yourself can be the better way. Note, however, that Aaron's code will fail if one of the commands inside the loop wipes out the r() results left behind by import excel , describe. There are, obviously, ways to solve this. The code will also mess things up if it fails for one of the files. Then you will have half the files converted and the other half still in Excel format. Again, you can rerun and sort things out. The advantage of using xls2dta is that the program handles all these details for you with a (hopefully) convenient syntax.

                            Best
                            Daniel
                            Good point about the returned results messing things up (as Francois discovered as well!). I remember running into that issue myself when I did a similar thing a while back. And the mid-loop failure is another good point which I hadn't considered.

                            For completeness, the following code should solve these issues with built-in commands:

                            Code:
                            * describe sheets
                            import excel using eg.xlsx, describe
                            return list
                            
                            * store returned values
                            local N = r(N_worksheet)
                            forvalues i=1/`N' {
                                local sheet`i' "`r(worksheet_`i')'"
                            }
                            
                            * loop through all sheets, summarise data, then save tempfiles
                            forvalues i=1/`N' {
                                import excel using eg.xlsx, sheet(`"`sheet`i''"') firstrow clear
                                summarize
                                tempfile file`i'
                                save `file`i''
                            }
                            
                            * now save the dta files
                            forvalues i=1/`N' {
                                use `file`i'', clear
                                save `"`sheet`i''"', replace
                            }
                            That's obviously getting a bit more complicated than my initial code though.

                            And I hadn't come across xls2dta before. It looks pretty useful though, so I will definitely keep it in mind for the future!

                            Comment


                            • #15

                              Hi Aaron McMillan, I tried your code in #7. However, I got the following error message:

                              Code:
                              invalid file specification
                              r(198);
                              I tried to fix the error but could not.

                              Thanks a lot for your help!

                              Comment

                              Working...
                              X