Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing and appending multiple files

    Hello group,

    I have approximately 90 datasets. The datasets contain costumer data for each day (3 months data). The datasets have the same variables and are saved as cvs files. They are saved in the same directory (location) and have consistent names (ex. APRIL 1 2016, APRIL 2 2016, etc). We do not have daily data for holidays so some dates can be skipped. Each dataset file is label with the corresponding date but inside the dataset there is not a date variable. My goal is to append the datasets but first I need to generate a date variable for each dataset. I can import the dasets and generate the date variable individually for each dataset but I was wondering if there is a more efficient way to do this, perhaps a loop. I would appreciate any help. Thank you!


    Code:
    clear
    cd "H:\Daily Data" // select directory 
    import delimited "H:\Daily Data\APRIL 1 2016.csv", bindquote(strict) case(preserve) clear  // import data 
    gen str str_TodaysDate="1April2016" // gen date variable
    gen double TodaysDate=date(str_TodaysDate,"DMY")
    format TodaysDate %td
    drop str_TodaysDate
    save 4_1_2016.dta, replace


  • #2
    Code:
    clear
    // get a list of all the relevant files in your directory
    cd "H:\Daily Data"
    local flist: dir "." files "*.csv"  // see -help extended_fcn-
    // import each file in the list and save it to a tempfile
    local nfile 0
    foreach fname of local flist {
        local ++nfile     
        import delimited "`fname'", bindquote(strict) case(preserve) clear   
        local datestring = subinstr("`fname'", ".csv", "", .) // no need for a variable
        gen double TodaysDate=date("`datestring'","DMY")
        format TodaysDate %td
        //
        tempfile temp`nfile'
        save using "`temp`nfile''"
    }    
    // append allof the nfile files
    clear
    forval i = 1/`nfile' {
      append using "`temp`nfile''"
    }
    Barring minor errors, this should work.

    Comment


    • #3
      Mike Lacy Thank you so much!!! I will try it and let you know how it went!

      Comment


      • #4
        Hi Mike Lacy ,

        Your code run fine until the last part when I run each part at a time.

        Code:
        . clear
        
        . forval i = 1/`nfile' {
          2.   append using "`temp`nfile''"
          3. }
        invalid syntax
        r(198);
        
        end of do-file
        
        r(198);
        When I select all the code and run it I got this. This codes are very advance and I haven't used them. I hope you can help me with this. Thanks

        Code:
        . // get a list of all the relevant files in your directory
        . cd "H:\data test"
        H:\data test
        
        . local flist: dir "." files "*.csv"  // see -help extended_fcn-
        
        . // import each file in the list and save it to a tempfile
        . local nfile 0
        
        . foreach fname of local flist {
          2.     local ++nfile     
          3.     import delimited "`fname'", bindquote(strict) case(preserve) clear   
          4.     local datestring = subinstr("`fname'", ".csv", "", .) // no need for a variable
          5.     gen double TodaysDate=date("`datestring'","MDY")
          6.     format TodaysDate %td
          7.     //
        .     tempfile temp`nfile'
          8.     save using "`temp`nfile''"
          9. }    
        (46 vars, 1,592 obs)
        invalid '"C:\Users\Maliaga\AppData\Local\Temp\ST_0h000001.tmp' 
        r(198);
        
        end of do-file
        
        r(198);

        Comment


        • #5
          Local macros, such as the ones used to hold the names of the various tempfiles, and the file list, only exist while the program that contains them is running. If you run the first part of the code, up to (say) the part where I have the comment "// append all of the nfile files", Stata will have imported and saved as a temporary file each of your files. However, Stata "forgets" the names of all the temporary files that were created (`temp1', `temp2' ...) as soon as those lines of code are executed. (And, it drops those files as well.) So, neither the names nor the files exist, and any further code that needs that will no longer exist. This is a feature of Stata, namely that tempfiles and the content of local macros only exist until the current set of code that is running has finished. This code, and other Stata codes using locals, must be run all at once.

          However, that aside, there are two mistakes in my code:

          save using "`temp`nfile''" should be save "`temp`nfile''"
          and
          append using "`temp`nfile''" should beappend using "`temp`i''"
          Sorry for the confusion.

          Comment


          • #6
            Mike Lacy Thank you so much!!!! When I think Stata can't do anything more, it always surprises me! I haven't used macros at all. I think it is time to start understanding and studying it. I tried your coded with a sample of my datasets and it worked perfectly. I will run it using all my datasets tomorrow, I don't think I should have a problem.

            Just to understand, we first generate a list of my cvs files in my folder. Then we import each dataset and create the date variable using the name of my file(this is awesome)???. Then we save each dataset in a temp file and finally we merge these temp files. Is this correct?

            Thank you again!

            Comment


            • #7
              Yes, your description is right. And, as to learning this kind of programming, an excellent place to start is with an older article of Nick Cox's. "Speaking Stata: How to face lists with fortitude," which is available free at http://www.stata-journal.com/sjpdf.h...iclenum=pr0005

              Comment


              • #8
                Mike Lacy Sorry to come back to this post. Just to refresh, I need to import and create a date variable for each file and the append them. Before my datasets were saved in csv format but now they are saved in xlsx. I changed your code (from cvs to xlsx) but got some warning messages and my data does not look right. Below you can see my codes:

                Code:
                clear 
                * Get a list of all the relevant files in your directory
                cd "H:\PMA\Population Research\Daily Cross Section (Co-mingle YA)"
                local flist: dir "." files "*.xlsx"  
                
                * Import each file in the list and save it to a tempfile
                local nfile 0
                foreach fname of local flist {
                    local ++nfile     
                    import delimited "`fname'", bindquote(strict) case(preserve) clear   
                    local datestring = subinstr("`fname'", ".xlsx", "", .) 
                    gen double TodaysDate=date("`datestring'","MDY")
                    format TodaysDate %td
                    //
                    tempfile temp`nfile'
                    save "`temp`nfile''"
                }    
                * Append all of the nfile files
                clear
                forval i = 1/`nfile' {
                  append using "`temp`i''" , force 
                }
                And the error message
                Code:
                Note:    18,441 binary    zeros    were    ignored    in    the    source    file.    The    first    instance    occurred    on    line    1.    Binary    zeros    are    not    valid    in    text    data.    Inspect    your    data
                    carefully.
                The appended data is virtually empty and some cells contains weird symbols. My xlsx files are named using the following structure April 25, 2016 Perhaps the "," comma is affecting the code?

                Any ideas? I would greatly appreciate any help.

                Marvin

                Comment


                • #9
                  You can't use -import delimited- to import .xlsx files. -import delmited- is for delimited text files only. The "binary zeroes" that Stata is warning you about are part of the compressed encoding used in .xlsx files. Whatever Stata ended up bringing in from those files is not usable data.

                  To bring in .xlsx files you need to use -import excel-. Note that -import excel- has different options from -import delimited-, so you will have to change those as well. See -help import excel- for more information to enable you to complete your modification of the code.

                  Comment


                  • #10
                    Sorry about that! I didn't catch that! I change to import excel.... and the code is running now.

                    Thank you!

                    Comment


                    • #11
                      Hi Clyde Schechter and Mike Lacy,

                      This is related to my original post, just excluding one file.

                      I have approximately 6 data-sets. The data-sets (in csv format) contain costumer data for different time periods: ex. 2011to2012, 2013to2014, 2015to2016 2017topresent. They are saved in the same directory (location) and have consistent names . I need to import these data-sets into Stata and append them. I would like to have a appended dataset for all the data before the current file (2017topresent). That is, I want to append 2011to2012, 2013to2014, 2015to2016. How can I do this? The original code won't work since it appends all the files in the folder.

                      Thanks,
                      Marvin
                      Last edited by Marvin Aliaga; 31 Aug 2017, 12:56.

                      Comment


                      • #12
                        The end of lines got eaten in the code you posted, which impairs a better demo, but a brute force approach like this would work:
                        Code:
                        local i = 0
                        foreach f in anylist 2001to2012 2013to2014 ..... {
                            import delimited  using "`f'"...................
                            local ++i
                            tempfile temp`i'
                            save "`temp`i''"
                        }
                        clear
                        forval i = 1/`i' {
                           append "`temp`i''"
                        }

                        Comment


                        • #13
                          Thank you so much! It worked perfectly. I just removed "anylist" form the code.
                          Original foreach f in anylist 2001to2012 2013to2014 { New foreach f in 001to2012 2013to2014 { Thank you again! Marvin

                          Comment


                          • #14
                            I need to import and append *csv files. I run the following code:

                            Code:
                            clear
                            * Get a list of all the relevant files in your directory
                            cd "G:\TWSE\2014_extracted\2014_01_03"
                            local flist: dir "." files "*.csv"  
                            
                            * Import each file in the list and save it to a tempfile
                            local nfile 0
                            foreach fname of local flist {
                                local ++nfile     
                                import delimited "`fname'", bindquote(strict) case(preserve) clear   
                                tempfile temp`nfile'
                                save "`temp`nfile''"
                            }    
                            * Append all of the nfile files
                            clear
                            forval i = 1/`nfile' {
                              append using "`temp`i''" , force
                            }
                            For some reason "temp.dta" file is empty.

                            Where is an error?

                            Comment


                            • #15
                              Olena, This is the code that I use use to import and append csv files and other text files (depending on what the delimiter is set to). It uses a different method than the foreach, but it gets the job done. Perhaps it will be useful.

                              Code:
                              // specify path to the folder with all the datafile
                                cd /data
                              // create list of data files in the folder and save it into a file
                                  ! ls *.csv > list_of_datfiles.txt
                              
                              // Open files one by one and append to the master dataset
                              file open myfile using list_of_datfiles.txt, read
                              file read myfile line
                              insheet using `line', delimiter(",")  
                              save master_data.dta, replace
                              
                              drop _all
                              
                              file read myfile line
                              while r(eof)==0 {
                                  display " OPENING `line'"
                                  insheet using `line', delimiter(",")
                              
                                  append using master_data.dta
                                  save master_data.dta, replace
                                  drop _all
                                  file read myfile line
                              
                              }
                              
                              file close myfile

                              Comment

                              Working...
                              X