Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import multiple CSV files

    I have a problem importing multiple CSV files (located in the same folder) to Stata (dta.) files. I've tried to use the following code:

    ** cd "D:\Data\Workdata\703987\CSV-filer"
    clear
    local myfilelist : dir . files"*.csv"
    foreach file of local myfilelist {
    drop _all
    insheet using `file'
    local outfile = subinstr("`file'",".csv","",.)
    save "`outfile'", replace
    }

    **

    But when I run the do-file, Stata gives an "invalid syntax" error-message?

    Does anyone know what to do?

    /Jonas

    Last edited by Jonas Meyer; 16 Jul 2014, 06:56.

  • #2
    Jonas,

    I was able to run your code without error on a directory with some CSV files in it. I would suggest doing a set trace on and then running your code again to see exactly where the error is occurring. Let us know if you still can't figure out the problem.

    Regards,
    Joe

    Comment


    • #3
      It's likely because you have a file with spaces (or other special characters) in the name. Change
      insheet using `file'
      to
      insheet using `"`file'"'
      and it should fix the problem.

      N.B. When you don't have control over the filenames you're referencing, it's good practice to enclose all references in compound double quotes (i.e., `" and "').

      Comment


      • #4
        Thank's Phil - problem solved!

        Comment


        • #5
          You said you want to "import" multiple CSV files, but from what I understand, what this code actually does is "convert" each of your CSV files into individual DTA files. At least, that's what the code did when I tested with my CSV files. So, on a related note, how can I import multiple CSV files into the same file, since I want them to form a single database?

          Comment


          • #6
            Once you have the separate .dta files, you can -append- or -merge- them, depending on what the relationships among them are.

            Comment


            • #7
              Code:
              **Set working directory
              
              clear
              tempfile building
              save `building', emptyok
              
              local myfilelist : dir . files"*.csv"
              foreach file of local myfilelist {
              drop _all
              insheet using `"`file'"'
              local outfile = subinstr("`file'",".csv","",.)
              append using "`building'"
              save `building', replace
              }
              Does that work for you?

              Comment


              • #8
                Re #7:

                That's fine if the nature of the files is that each contains data on new "entities." But if the files contain different information on the same entities, then they should be -merge-d rather than -append-ed. I grant that the -append- situation is more common in practice, but the -merge- situation does sometimes arise.

                Sometimes both are needed, actually. Some surveys publish their data sets separately by wave, but also parcel out different variables into different subfiles. So then you need to merge the subfiles within wave, and append the waves together.

                Comment


                • #9
                  Good point.

                  Comment


                  • #10
                    That hasn't worked for me because the files are separated by semicolons, not commas. Hence, I cannot just mass convert then "use" them, because I just get a database whose only variable is "v1" with a value of 1. It seems I have to really "import" the files in order to tell Stata that the separator is the semicolon. However, when I import a file, it always says I cannot import another one because the current base will be lost no matter what. In short, is there a way to mass import multiple TXT (or CSV files) even though they are separated by semicolons?

                    Comment


                    • #11
                      Actually, -insheet- is now an outdated command. It has been superseded by -import delimited-, which, in turn, has a -delimiters()- option that allows you to specify the separator. And since the loop includes a -save- command at the bottom, there should be no hesitation on Stata's part to then read in the next file.

                      If you use the code in the loop shown in #1, the files will all be separately imported. Then you can run another loop over the .dta files to -merge- or -append- them, whichever is appropriate. Once they are .dta files, Stata will neither know nor care what the separator in the original CSV files was.

                      Comment


                      • #12
                        See option delimiter() of command insheet. Or its equivalent in the import delimited.

                        Comment


                        • #13
                          hi all
                          I want to import 1460 csv file in stata, turning these csv in dta files and append in a one file. All my files are in the same folder and they have the same variables.
                          I am trying with four files csv, running this command:

                          Code:
                          clear all
                          cd "Mydirectory"
                          
                          global filenames: dir . files "*.csv"
                          
                          foreach file of global filenames {
                          import delimited "`file'" , delimiter(";") bindquote(nobind)
                          save temp, replace
                          append using temp, force
                          }
                          However, I get that the only one file from csv becomes dta.

                          Can someone help me?

                          Comment


                          • #14
                            It's because you have code in the wrong order. You save the file you have just read into temp, overwriting any earlier files that might have been there. So the files do not accumulate. You need to -append- first and then -save-.

                            Also, I strongly recommend you use a local macro for filenames, not a global. Global macros are an inherently unsafe programming practice, particularly when you use a common name like filenames. Most of the time you won't get into trouble, but when you do, the debugging will be a living nightmare. There are no advantages to using a global macro in this context that might make this chance worth taking.

                            So:

                            Code:
                            clear all
                            cd "Mydirectory"
                            
                            local filenames: dir . files "*.csv"
                            
                            foreach file of local filenames {
                                import delimited "`file'" , delimiter(";") bindquote(nobind)
                                append using temp
                                save temp, replace
                            }
                            Finally, you are really asking for trouble by using the -force- option on -append-. The fact that you seem to think you need it suggests that you anticipate that the various files you are putting together contain inconsistent, incompatible representations of the same variables. When you use -force-, Stata does not reconcile these inconsistencies. It simply obliterates part of the data. So unless you are 100% certain that the variables that are not consistent across files contain no information you will ever need, you are setting up an incorrect data file that will subvert all your later analyses.

                            Now when you say "All my files are in the same folder and they have the same variables." you seem to be expressing confidence that, in fact, the files are all created in a way that makes them consistent and compatible. If that is true, then there is no need to use the force option--it will have no effect. So do it without the -force- option. If you are correct that the data sets are all compatible, everything will go smoothly. If not, Stata will confront you with the problem so you can solve it instead of pretending it doesn't exist.

                            In my experience, even the best curated data sets contain errors. The likelihood that 1,460 data sets have been created in precisely uniform fashion and are completely compatible is pretty small. I wouldn't bet on it.

                            Comment


                            • #15
                              Thanks Clyde,
                              after your suggest I found in this forum another similar command.

                              Code:
                              clear all
                              local satafiles: dir . files "*.csv"
                              
                              foreach file of local satafiles {
                              preserve
                              insheet using `file',delimiter( ; ) names clear
                              save temp,replace
                              restore
                              append using temp
                              }
                              Now, if I use this code, nothing happens and I do not get any error from Stata. while, if I use the same code changing -local- with -global- I have no problems and all is well.
                              do you have an explanation about this?

                              Comment

                              Working...
                              X