Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending Large Number of Files

    I have 30+ .dta files that I would like to combine into dataset. They are all saved in the same folder. What is the fastest way to achieve this?

  • #2
    You don't say in what way you wish to combine them. Merge? Append?

    Either way, it is simple to do it in a loop. I'm going to start by assuming that the 30+ dta files of interest are the only .dta files in the folder. The core of the solution is a loop over the filenames

    Code:
    clear*
    local filenames: dir "." files "*.dta"
    tempfile building
    save, emptyok
    
    foreach f of local filenames {
        use `"`f'"', clear
        /* INSERT HERE EITHER
        append using `building'
       OR
        merge 1:1 or perhaps 1:m  key_variables using `building'
       */
       save `"`building'"', replace
    }
    
    use `building', clear
    save combined_data_set, replace
    That said, in practice one often runs into problems doing this. In an append loop, it often turns out that the same variable has subtly different names in the different data sets, or is a string variable in some and a labeled numeric variable in another, etc. In a merge loop one sometimes finds unexpected mismatches on the merge key variables, etc. If your data are all very clean, the above will work as is. But be prepared for problems along the way--not all of which will necessarily cause Stata to complain and break the loop. So it is wise to both include appropriate -assert- statements in the loop to check for trouble as you go along, and also to carefully check the final combined data set for problems.

    Comment


    • #3
      I agree with Clyde's caution about combining datasets this way. Clyde, there's no need to save at each pass. I would also suggest adding a variable to track the source when appending

      Code:
      clear
      
      * get a list of datasets in the current directory
      local flist : dir . files "*.dta"
      dis `"`flist'"'
      
      * loop over each dataset and append to the data in memory;
      * it's OK to append if we start with no data;
      * create a numeric source variable and build labels as we go
      gen source = .
      local i 0
      foreach f in `flist' {
          append using "`f'"
          replace source = `++i' if mi(source)
          label def source `i' "`f'", add
      }
      
      * attach the value labels to the source variable
      label value source source
      tab source

      Comment


      • #4
        bnuss89: As a member (not me) asked earlier this month (edited)

        Please see item 6 in the FAQ. The former explains the preference for real user names.
        Please re-register with a real name. It's easy: use the CONTACT US below right to email the administrators.

        Comment

        Working...
        X