Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing all files containing specific names in subfolders of a folder

    Hello everyone,

    I am trying to import several CSV files into Stata, retain specific variables, and then append all the files together into one dataset. However, I am encountering an error where Stata can't find the directory. I suspect that the issue might be with the way I'm referencing the folder paths, but I am not sure what the exact problem is. Could anyone help me understand what might be causing the error and suggest a solution?
    1. The main folder is named "TEST".
    2. Inside the "TEST" folder, there are two subfolders: "BaseA" and "BaseB".
    3. Each subfolder contains 35 CSV files.
    4. The files in BaseA have names that begin with "CR_Base_A_" and end with a date (e.g., CR_ADV_Base_A_20220101.csv).
    5. I want to:
      • Import all these files into separate Stata files and save them.
      • Then keep the variables "Name", "Ownership", and "CIK" from each file and then append all into one combined dataset.
    I have been using a loop to import these files, but I keep getting an error stating that the directory cannot be found. Below is the code I have been trying.


    global main_directory "C:\Users\fimi\Desktop\TEST"

    local subfolders BaseA BaseB
    clear
    foreach subfolder of local subfolders {


    local path "$main_directory\\`subfolder'"


    local files: dir "`path'" files "CR_Base_A_*.csv"


    foreach file of local files {

    di "Importing file: `file'"


    import delimited "`path'\\`file'", clear


    keep Name Ownership CIK


    append using "`path'\\`file'", force

    di "Appended file: `file'"
    }
    }


    save "$main_directory\\Appended_Data.dta", replace

  • #2
    The files in BaseA have names that begin with "CR_Base_A_" and end with a date (e.g., CR_ADV_Base_A_20220101.csv).
    I want to:
    • Import all these files into separate Stata files and save them.
    Your code loops through BaseA and BaseB. I’m unsure if that’s intentional because your description leads me to believe you’re only interested in BaseA.

    I find filelist (SSC) makes tasks like these much more manageable. You might need to adjust the code below since I can't test it on my end, but perhaps this will give some direction

    Code:
    clear all 
    tempfile temp 
    save `temp', emptyok 
    
    filelist, dir("C:\Users\fimi\Desktop\TEST\") pattern("CR_Base*.csv") 
    
    gen fname = dirname + filename
    keep if regexm(lower(fname), "base[a-b]")
    
    levelsof fname, local(files) 
    foreach x in `files' {
        import delimited using "`x'", clear   stringcols(_all) 
        keep Name Ownership CIK
        append using `temp'
        save `temp', replace 
    }

    Comment

    Working...
    X