Importing and Appending multiple .csv files - ignoring certain rows and variable names not on 1st row

Josh Culverhouse

Join Date: Oct 2022

Posts: 2
#1

Importing and Appending multiple .csv files - ignoring certain rows and variable names not on 1st row

02 May 2023, 06:15

I am trying to import and append multiple .csv files - and add the individual file names as an ID variable. The code below was taken from various Google searches (apologies, I can't remember the exact source). The code runs and imports/appends the files correctly, and works for files with the first row as variable names. However, the first 13 rows of the .csv files I have are unwanted information, followed by the 14th row which is the variable names - and therefore the variable names do not get identified and all data is imported as string.

I have come unstuck trying to add in any additional commands to ignore the first X-number of rows of each file (13 rows in this case) - and also use the X-row as the variable names (14th row in this case).

It is possible to just drop these unwanted rows, rename the variables, and destring - but I'd like to see if there is a 'proper' way to do this.

Notes:
- The Excel .csv format I have uses a semi-colon (";") delimiter.
- I am using Stata v17.0.

Code:

cd "C:\csv_files_to_append" local filepath = "`c(C:\csv_files_to_append)'" local files : dir "`filepath'" files "*.csv" tempfile name save `name', replace empty foreach x of local files { qui: import delimited "`x'", delimiter(";") case(preserve) clear qui: gen id = subinstr("`x'", ".csv", "", .) append using `name' save `name', replace }

Many thanks
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

02 May 2023, 07:09

Are any of these CSVs public? If so, give two URLs: one that imports without a problem, and the other with many rows of unwanted info.
Comment
Josh Culverhouse

Join Date: Oct 2022

Posts: 2
#3

02 May 2023, 08:03

Hello Jared - I have to correct my original post - I forgot that I had found a workaround for the files that have var names on row 1. So I actually have the same issue when using the code in the original post for both the 1st-row varnames and for the unwanted info/14th-row varnames version CSVs.

Unfortunately, the CSVs are not public. I could send a dummy example - but I think my institution is preventing me from sharing the URL.
The unwanted info is only in column 1 and rows 1 to 13. The varnames then sit under this across row 14.

The workaround involved converting the CSV to .UTF8 format - which successfully imports the first row as variables using the code in the original post. But this isn't ideal (as I'd have to convert lots of files) - and it doesn't work for the other with unwanted info in rows 1/13.

(Just to clarify - I am not trying to import a combination of these two types of CVSs I have mentioned).

Thanks
Comment

Announcement

Importing and Appending multiple .csv files - ignoring certain rows and variable names not on 1st row

Comment

Comment