Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding many csv files into one Stata dta file

    Hello Everyone,

    I have about 30 csv files, each of them having about 3000 observations, yet all the files have the same variable list. I want to create one Stata dta file that will contain all those 30 different csv files. I know how to append several datasets with an option called "append", but it will be extremely tedious to do it for 30 csv files.
    I would highly appreciate if you could help me with a shorter way to create the dataset.

    Thank you very much in advance and best regards,
    Nick

  • #2
    I know how to append several datasets with an option called "append", but it will be extremely tedious to do it for 30 csv files.
    That is why loops are used. To avoid doing repetitive tasks. But with CSV files, you first need to import them. If there is a pattern to the names, you can easily define a loop which imports and appends. If there is no pattern, then it gets more challenging. Here is some pseudo code, assuming the files are named file1, file2, ..., file30.

    Code:
    import delimited file1.csv, clear
    gen file=1
    save mydata, replace
    forval i=2/30{
        import delimited file`i'.csv
        append using mydata
        replace file=`i' if missing(file)
        save mydata, replace
    }
    use mydata, clear
    sort file

    Comment


    • #3
      Originally posted by Andrew Musau View Post

      That is why loops are used. To avoid doing repetitive tasks. But with CSV files, you first need to import them. If there is a pattern to the names, you can easily define a loop which imports and appends. If there is no pattern, then it gets more challenging. Here is some pseudo code, assuming the files are named file1, file2, ..., file30.

      Code:
      import delimited file1.csv, clear
      gen file=1
      save mydata, replace
      forval i=2/30{
      import delimited file`i'.csv
      append using mydata
      replace file=`i' if missing(file)
      save mydata, replace
      }
      use mydata, clear
      sort file
      Thank you very much, Andrew, this is super helpful. However, I am having another difficulty with saving the file in the location where I want. I would again highly appreciate if you direct me here as well. Below is the code that I use:
      Code:
      
               global Announcements                     "C:\Users\user\Announcements data"
              global final_data                            "C:\Users\user\Final data"
      
      
                  import excel "$Announcements\1.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                  gen file=1
                  save "$final_data/Announcements_merged", replace
                  forval i=2/86{
                      import excel `i'.xls, sheet("Screening") cellrange(A8:O10008) firstrow clear
                      append using Announcements_merged
                      replace file=`i' if missing(file)
                      save Announcements_merged, replace
                  }
                  use Announcements_merged, clear
                  sort file
      The 86 excel file that I want to merge are in the "Announcements" folder, whereas I want to save the final merged dataset in the "Final data" folder. When run the code, Stata reports that "file /1.xls not found". I have tried several other ways, but could not save the dataset in the desired folder. Any help would be highly appreciated!

      Comment


      • #4
        How are the Excel files named within "C:\Users\user\Announcements data"?

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          How are the Excel files named within "C:\Users\user\Announcements data"?
          It is not the excel file, it is the folder where the excel files are stored.

          Comment


          • #6
            #4 asks for filenames.

            Comment


            • #7
              Originally posted by Andrew Musau View Post
              #4 asks for filenames.
              All files are named with numbers from 1 to 86, like 1.xls, 2xls, ... , 86.xls.

              Comment


              • #8
                If "C:\Users\user\Announcements data" is not your current directory, you need

                Code:
                import excel "$Announcements\`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                in place of

                import excel `i'.xls, sheet("Screening") cellrange(A8:O10008) firstrow clear
                [.] Otherwise, add

                Code:
                cd "C:\Users\user\Announcements data"
                on top of the do-file code.

                Comment


                • #9
                  Originally posted by Andrew Musau View Post
                  If "C:\Users\user\Announcements data" is not your current directory, you need

                  Code:
                  import excel "$Announcements\`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                  in place of



                  [.] Otherwise, add

                  Code:
                  cd "C:\Users\user\Announcements data"
                  on top of the do-file code.
                  Thank you very much, Andrew, highly appreciate your support. Sorry for bothering you this much, I just wanted to know if there is a way to save the final output in the $final_data location, rather than in $Announements. The code and everything works fine, but it saves in a location ($Announcements) where I don't want, I would very much want to save it in $Final_data. I spent good amount of time trying different ways, but in the end I was running into other issues with code. If you dont mind, I would really be grateful if you could advise me how to save it in the $Final_data location. To remind you the code, it is

                  Code:
                  global Announcements                     "C:\Users\user\Announcements data"        
                  global Final_data                            "C:\Users\user\Final data"
                  
                  import excel "$Announcements\1.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                  gen file=1
                  save Announcements_merged, replace
                  forval i=2/86{
                            import excel "$Announcements\`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                            append using Announcements_merged
                            replace file=`i' if missing(file)
                            save Announcements_merged, replace
                  }
                  
                  use Announcements_merged, clear
                  
                  sort file
                  Last edited by Nick Baradar; 14 Feb 2023, 03:45.

                  Comment


                  • #10
                    Code:
                    global Announcements "C:\Users\user\Announcements data"        
                    global Final_data "C:\Users\user\Final data"
                    
                    import excel "$Announcements\1.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                    gen file=1
                    save"$Final_data\Announcements_merged", replace
                    forval i=2/86{
                              import excel "$Announcements\`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                              append using "$Final_data\Announcements_merged"
                              replace file=`i' if missing(file)
                              save "$Final_data\Announcements_merged", replace
                    }
                    use "$Final_data\Announcements_merged", clear
                    sort file

                    Comment


                    • #11
                      Originally posted by Andrew Musau View Post
                      Code:
                      global Announcements "C:\Users\user\Announcements data"
                      global Final_data "C:\Users\user\Final data"
                      
                      import excel "$Announcements\1.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                      gen file=1
                      save"$Final_data\Announcements_merged", replace
                      forval i=2/86{
                      import excel "$Announcements\`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                       append using "$Final_data\Announcements_merged"
                      replace file=`i' if missing(file)
                      save "$Final_data\Announcements_merged", replace
                      }
                      use "$Final_data\Announcements_merged", clear
                      sort file
                      Thanks a lot for the clarification, I tried that code before, and I receive a message that "file C:\Users\user\Announcements data`i'.xls not found".

                      Comment


                      • #12
                        Also, see xls2dta (SSC).

                        Comment


                        • #13
                          Originally posted by Nick Baradar View Post
                          I receive a message that "file C:\Users\user\Announcements data`i'.xls not found".
                          Show the exact commands that you ran and Stata output.

                          Comment


                          • #14
                            Originally posted by Andrew Musau View Post

                            Show the exact commands that you ran and Stata output.
                            The code is the following:
                            Code:
                                        
                                                global Announcements                     "C:\Users\User\Announcements data"
                                                global Final_data                            "C:\Users\user\Final data"
                                                
                                                
                                        
                                        import excel "$Announcements\1.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                                        gen file=1
                                        save"$Final_data\Announcements_merged", replace
                                        forval i=2/86{
                                        import excel "$Announcements\`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                                         append using "$final_data\Announcements_merged"
                                        replace file=`i' if missing(file)
                                        save "$Final_data\Announcements_merged", replace
                                        }
                                        use "$Final_data\Announcements_merged", clear
                                        sort file
                            And the output is "
                            file C:\Users\User\Announcements data`i'.xls not found
                            r(601);"

                            Comment


                            • #15
                              Okay, I forgot about the behavior of locals that follow a backslash. You need double backslashes or a forward slash.

                              Code:
                              global Announcements "C:\Users\user\Announcements data"        
                              global Final_data "C:\Users\user\Final data"
                              
                              import excel "$Announcements\1.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                              gen file=1
                              save Announcements_merged, replace
                              forval i=2/86{
                                        import excel "$Announcements/`i'.xls", sheet("Screening") cellrange(A8:O10008) firstrow clear
                                        append using "$Final_data\Announcements_merged"
                                        replace file=`i' if missing(file)
                                        save "$Final_data\Announcements_merged", replace
                              }
                              
                              use "$Final_data\Announcements_merged", clear
                              
                              sort file

                              Comment

                              Working...
                              X