Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Append all .dta files in directory

    Many people asked this quesion on 'batch appending' in many ways. Some of them are obscure so I tried the simpler ones, including the fs (ssc), but not getting anywhere. Does anyone have a concrete solution for this.

    Problem: I've 39 datasets in the directory, and want to 1) append them in one shot, 2) keeping only selected variables from each datafile.

    Thanks in advance...
    Last edited by Sonnen Blume; 28 Dec 2019, 21:32.

  • #2
    Something like this?

    Code:
        local files : dir "" files "*.dta"
        foreach file in `files' {
            append using `file', keep(var1 var2)
        }

    Comment


    • #3
      Originally posted by N Sonntag View Post
      Something like this?

      Code:
      local files : dir "" files "*.dta"
      foreach file in `files' {
      append using `file', keep(var1 var2)
      }
      Hi Sonntag,
      I tried this, but its importing only one and the same dataset everytime:

      Code:
      local files : dir "C:\Users\Administrator\Desktop\gam" files "*.dta"
          foreach file in `files' {
              append using `file', keep(ID Country Age Income)
          }

      Comment


      • #4
        Originally posted by Sonnen Blume View Post

        Hi Sonntag,
        I tried this, but its importing only one and the same dataset everytime:

        Code:
        local files : dir "C:\Users\Administrator\Desktop\gam" files "*.dta"
        foreach file in `files' {
        append using `file', keep(ID Country Age Income)
        }
        Got it! On the third line `file' should be "`file'"

        Thanks a lot for the code!

        Comment


        • #5
          Two points for those who might find this thread at a later date:
          1. Since Stata 12 or so, you can append as many files as you would like in one command, so the loop is not needed.
          2. The code in #3, and #4 will not work unless the user happens to be in the named directory given above (C:\Users\Administrator\Desktop\gam), because the -local files- macro function returns file names without any preceding path.
          So... the following would work for the given example
          Code:
          cd "C:\Users\Administrator\Desktop\gam"
          local theFiles: dir . files "*.dta"
          clear
          append using `theFiles', keep(ID Country Age Income)
          To be slicker, one could use an inline expansion of the -dir- macro function:
          Code:
          cd "C:\Users\Administrator\Desktop\gam"
          clear
          append using `: dir . files "*.dta"', keep(ID Country Age Income)

          Comment


          • #6
            Originally posted by Bill Rising (StataCorp) View Post
            Two points for those who might find this thread at a later date:
            1. Since Stata 12 or so, you can append as many files as you would like in one command, so the loop is not needed.
            2. The code in #3, and #4 will not work unless the user happens to be in the named directory given above (C:\Users\Administrator\Desktop\gam), because the -local files- macro function returns file names without any preceding path.
            So... the following would work for the given example
            Code:
            cd "C:\Users\Administrator\Desktop\gam"
            local theFiles: dir . files "*.dta"
            clear
            append using `theFiles', keep(ID Country Age Income)
            To be slicker, one could use an inline expansion of the -dir- macro function:
            Code:
            cd "C:\Users\Administrator\Desktop\gam"
            clear
            append using `: dir . files "*.dta"', keep(ID Country Age Income)
            Thanks Bill for the sharigng code. But the last line is showing error:

            Code:
            . append using `theFiles'
            invalid file specification
            r(198);

            Comment


            • #7
              Bill's code works for me as is. Are you sure that you've entered both the local macro definition and run he append command from the command window? A quick way to check is by typing

              Code:
              di `" `theFiles' "'
              The surrounding compound quotes are to allow for the fact that each file name is wrapped in double quotes.

              Comment


              • #8
                Originally posted by Leonardo Guizzetti View Post
                Bill's code works for me as is. Are you sure that you've entered both the local macro definition and run he append command from the command window? A quick way to check is by typing

                Code:
                di `" `theFiles' "'
                The surrounding compound quotes are to allow for the fact that each file name is wrapped in double quotes.
                Still not working...I tried both Windows and Mac system, same error in both.

                Comment


                • #9
                  When running the code in #6, the most likely way to get the 'invalid file specification' error is if there are no datasets in the directory "C:\Users\Administrator\Desktop\gam".

                  So, be sure you are working in the directory holding your datasets. To see if you are in the right place before you start appending, try
                  Code:
                  dir *.dta
                  to see if you see your datasets. If you don't, be sure to -cd- to the proper directory first.

                  Comment


                  • #10
                    Originally posted by Bill Rising (StataCorp) View Post
                    When running the code in #6, the most likely way to get the 'invalid file specification' error is if there are no datasets in the directory "C:\Users\Administrator\Desktop\gam".

                    So, be sure you are working in the directory holding your datasets. To see if you are in the right place before you start appending, try
                    Code:
                    dir *.dta
                    to see if you see your datasets. If you don't, be sure to -cd- to the proper directory first.
                    Thanks a lot Bill for the clue! It turned out that extensions were capitalised (.DTA instead of .dta). The codes are working now.

                    Comment


                    • #11
                      This is strange.

                      The macro dir is not sensitive to the case in Windows and should give identical result regardless whether you use *.dta or *.DTA in the pattern.

                      But Sonnen Blume has written earlier
                      Still not working...I tried both Windows and Mac system, same error in both.
                      This is only possible in Stata on Windows if according to the manual an additional option was specified:

                      In Windows only, the respectcase option specifies that dir respect the case of filenames when performing matches. Unlike other operating systems, Windows has, by default, case-insensitive filenames. respectcase is ignored in operating systems other than Windows.
                      But Bill's code didn't include the respectcase option.

                      So, what fixed it??

                      Comment


                      • #12
                        PS: there was a similar confusion a few years ago:
                        https://www.stata.com/statalist/arch.../msg01156.html
                        I don't think there was any reply/explanation at that time.

                        Comment


                        • #13
                          Originally posted by Sergiy Radyakin View Post
                          This is strange.

                          The macro dir is not sensitive to the case in Windows and should give identical result regardless whether you use *.dta or *.DTA in the pattern.

                          But Sonnen Blume has written earlier


                          This is only possible in Stata on Windows if according to the manual an additional option was specified:



                          But Bill's code didn't include the respectcase option.

                          So, what fixed it??
                          Thanks for the quesiton. And how detective of you! It never occured to me that mac and windows differ in reading case-sensitive extensions. For me it never worked on windows, I got it done using mac.

                          Comment


                          • #14
                            Originally posted by Bill Rising (StataCorp) View Post
                            Two points for those who might find this thread at a later date:
                            1. Since Stata 12 or so, you can append as many files as you would like in one command, so the loop is not needed.
                            2. The code in #3, and #4 will not work unless the user happens to be in the named directory given above (C:\Users\Administrator\Desktop\gam), because the -local files- macro function returns file names without any preceding path.
                            So... the following would work for the given example
                            Code:
                            cd "C:\Users\Administrator\Desktop\gam"
                            local theFiles: dir . files "*.dta"
                            clear
                            append using `theFiles', keep(ID Country Age Income)
                            To be slicker, one could use an inline expansion of the -dir- macro function:
                            Code:
                            cd "C:\Users\Administrator\Desktop\gam"
                            clear
                            append using `: dir . files "*.dta"', keep(ID Country Age Income)
                            Bill, the command ends in errror if any of the datasets do not contain the selected variables. Is there any way to prevent this, like forcing the command to continue by ignoring the missing vars:

                            Code:
                            variable sp025 not found
                                You specified the keep(varlist) option.  The above error was the result of parsing the list of variables you supplied against the
                                using data.
                            r(111);

                            Comment


                            • #15
                              Originally posted by Sonnen Blume View Post

                              Bill, the command ends in errror if any of the datasets do not contain the selected variables. Is there any way to prevent this, like forcing the command to continue by ignoring the missing vars:

                              Code:
                              variable sp025 not found
                              You specified the keep(varlist) option. The above error was the result of parsing the list of variables you supplied against the
                              using data.
                              r(111);
                              I am new over here, but wouldn't it work if you just avoid the keep() option and just use the command after appending? I think it would have the same result than forcing it including missing values

                              Comment

                              Working...
                              X