Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over subfolders

    Dear all,
    I am trying to loop through different subfolders. I have specified part of the directory with "cd".
    `x' specifies the respective subfolder and "merged data" is the dataset I want to open (there is one dataset in each folder).

    Code:
    foreach x in forward_cites tech_similar tech_different tech .... {
    cd "K:/Gruppen/indoek/Florian/US EP/data/publ_level/"
    use "`x'/merged data.dta", clear
    ----commands---
    }
    --------------------------------------------------------------------

    After each iteration, Stata stops with an error such as "tech_similar (the subfolder) not found".

    Then, I write the loop just with the subfolder that could not be found and run the code again - e.g.,

    Code:
    foreach x in tech_similar {
    ....
    -------------------------------------

    Now, it works.

    From other posts I know that the loops might be sensitive to quotation marks, backslashes etc.
    However, I cannot figure out the source of the error in this particular case.

    Any ideas?

    Best,
    Florian

    Last edited by Florian Seliger; 11 Aug 2017, 01:51.

  • #2
    Hi Florian,

    without seeing your (sub)folder structure, we can only guess what the problem is. Additionally, please consider reading the FAQ (also linked at the top of this page) carefully; your syntax is easier to read when put in CODE-delimiters.

    Anyways, I have two recommendations from my experiences of handling directory names in Stata:
    1. only -cd- to your (parent) directory once (there is no need to nor gain from -cd- into the same directory repeatedly)
    2. put the subfolders into a local macro, and make each and any of them quoted there; then loop over this local's contents
    Code:
    cd "K:/Gruppen/indoek/Florian/US EP/data/publ_level/"
    local subfolders `""forward_cites" "tech_similar" "tech_different" "tech""'
    foreach x of local subfolders {
        use "`x'/merged data.dta", clear
    }
    Quotation when defining the local macro is essential here: The whole list should be enclosed in compound double quotes, and each element in regular double quotes (help quotes explains the difference). This, from my experience, is the only way to keep things sound in case someone comes up with the terrific idea of using space characters inside a file path.

    (You could even put each element of the list inside compound double quotes, to make everything robust against someone coming up with the even more terrific idea of putting double quotes in directory names.)

    Regards
    Bela
    Last edited by Daniel Bela; 11 Aug 2017, 01:49. Reason: formatting

    Comment


    • #3
      Originally posted by Daniel Bela View Post
      your syntax is easier to read when put in CODE-delimiters.
      edited!

      Originally posted by Daniel Bela View Post
      Quotation when defining the local macro is essential here: The whole list should be enclosed in compound double quotes, and each element in regular double quotes (help quotes explains the difference).
      Thank you for this suggestion.

      Originally posted by Daniel Bela View Post
      This, from my experience, is the only way to keep things sound in case someone comes up with the terrific idea of using space characters inside a file path.
      yes, I often do these terrible things as you can see from the cd



      Comment


      • #4
        You could also take a look at filelist (from SSC). It will create a dataset with partial paths from Stata's current directory. You can fine tune the list using the full complement of Stata's data management tools. The help file shows how to construct a loop to process each file.

        Comment


        • #5
          A few things: (1) the cd command can be brought out of the loop since it doesn't change, (2) inthe "use" command try using the "./" to signal to Stata that it should start at the current directory. For example:

          Code:
          cd "K:/Gruppen/indoek/Florian/US EP/data/publ_level/"
          
          local dir_list = "forward_cites tech_similar tech_different tech"
          
          foreach x of local dir_list {
                use "./`x'/merged data.dta", clear
                ----commands---
          }
          If you want to make this less dependent on individual editing, you should look into Stata's directory commands to automate the creation of the list of subdirectories in the main directory. For example, with a directory structure of "K:/Stuff/" in which directory there are 20 subdirectories called "Sub1", "Sub2", ... , "Sub20"

          Code:
          local dir_list : dir "K:/Stuff" dirs "*"  // Create a list of all subdirectories in the file path "K:/Stuff"
          
          foreach x of local dir_list {
               ... do stuff looping of list of subdirectories ...
          }
          Note, that you can also use the extended functions of macros to makes lists of all, or specific, files in a directory. I've used this structure when working with large datasets that come in many files and subdirectories.

          Happy looping!

          Comment


          • #6
            I am a Beginner in stata, I want to append different excel files from different subfolders .

            My current working directory is

            Code:
            cd "S:\06 Data\Data\Sector information\CSM\CSM\"
            The working directory contains 4 subfolders with the name

            CSM A1 2014 - RI
            CSM A1 2015 - RI
            CSM A1 2016- RI
            CSM A1 2017 - RI


            In each subfolder there are 4 excel files named
            subfolder (CSM A1 2014 - RI )
            CSM-Q1 2014 FR
            CSM-Q1 2014 DE
            CSM-Q1 2014 UK
            CSM-Q1 2014 DK


            Same for all the other three subfolders

            I want to import all these into single stata file. How can I do it ?.

            The discussions here made me to understand to append files from single subfolder. I have 4 subfolders. So please help me out.

            Thanks in Advance

            Comment


            • #7
              If those subfolders are the only ones and all Excel files can indeed be appended as they are, it could be as simple as

              Code:
              xls2dta , clear recursive : append using .
              This requires xls2dta and filelist, both from SSC.

              Best
              Daniel

              Comment


              • #8
                I found Daniel's code to be very useful. I have a main folder that has three levels: main folder > subfolders > subfolders > excel files

                Daniel's code :
                Code:
                xls2dta , clear recursive : append using .

                But if an excel file has multiple sheets/tabs, the data from these extra tabs is not appended. Is there a way that data from all tabs in all excel files is appended?
                Last edited by Romil Pandey; 18 Jan 2023, 19:52.

                Comment


                • #9
                  I found Daniel's code to be very useful. I have a main folder that has three levels: main folder > subfolders > subfolders > excel files

                  Daniel's code :
                  Code:
                  xls2dta , clear recursive : append using .
                  But if an excel file has multiple sheets/tabs, the data from these extra tabs is not appended. Is there a way that data from all tabs in all excel files is appended?

                  Comment


                  • #10
                    Code:
                    xls2dta , clear recursive allsheets: append using .

                    Comment


                    • #11
                      When I use the allsheets option, I run into a string/numeric mismatch error which doesn't happen without the allsheets option.
                      I am not sure why this is happening. The data type is "General" in all the excel sheets.
                      Attached Files

                      Comment


                      • #12
                        Clearly there is a data type mismatch in one of the sheets that was not being read originally. The fact that Excel thinks of the data type as "General" is probably immaterial. Stata infers the data type from what it sees as the cell contents. One specific assumption is that when it encounters an empty column, it defaults to regarding it as a numeric data type. This could be one reason for the problem you are facing -- assuming there is no genuine bad stuff going on, such as unexpected non-numeric characters in columns that should contain only numbers.

                        You might want to try
                        Code:
                        xls2dta , clear recursive allsheets: append using . , importopts(allstring)
                        which will import all columns as strings. You can then use the destring command later to convert the variables that ought to be numeric.

                        Comment


                        • #13
                          That worked perfectly for a my test subfolder but for the main folder, Stata keeps running code, files keep appending but it doesn't come to an end.

                          Besides that, I have one last thing to account for. This is the overview of my project:

                          Main folder < subfolders < subfolders < excel files
                          Note: Over 500 xls files and a lot of Excel files have multiple tabs/sheets so want to capture that data also.
                          1. Convert from xls to dta
                          2. In my final appended dataset I want to keep track of the excel file name that each observation came from. So somehow create a new variable FILENAME. As each file is converted, input the name of the excel file in the variable FILENAME for each observation in that file.
                          3. Append all dta files (all excel files have identical variables)
                          Code:
                          xls2dta , clear recursive allsheets: append using . , importopts(allstring)
                          This code converts all files and appends them including all data from tabs in excel files. Is there a to modify this code to achieve goal #2?

                          Comment


                          • #14
                            In the output of
                            Code:
                            help xls2dta
                            you will see documentation for the full syntax of the xls2dta command, including numerous options, one of which will do what you seek for goal #2.

                            When you are working with a command you are not fully familiar with, it is advisable to read the fine material presented in the help output for the command.

                            Comment

                            Working...
                            X