Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing and Appending multiple .csv files - ignoring certain rows and variable names not on 1st row

    I am trying to import and append multiple .csv files - and add the individual file names as an ID variable. The code below was taken from various Google searches (apologies, I can't remember the exact source). The code runs and imports/appends the files correctly, and works for files with the first row as variable names. However, the first 13 rows of the .csv files I have are unwanted information, followed by the 14th row which is the variable names - and therefore the variable names do not get identified and all data is imported as string.

    I have come unstuck trying to add in any additional commands to ignore the first X-number of rows of each file (13 rows in this case) - and also use the X-row as the variable names (14th row in this case).

    It is possible to just drop these unwanted rows, rename the variables, and destring - but I'd like to see if there is a 'proper' way to do this.

    Notes:
    - The Excel .csv format I have uses a semi-colon (";") delimiter.
    - I am using Stata v17.0.


    Code:
    cd "C:\csv_files_to_append"
    
    local filepath = "`c(C:\csv_files_to_append)'"
    
    local files : dir "`filepath'" files "*.csv"
    
    tempfile name 
    save `name', replace empty
    
    foreach x of local files {
    
        qui: import delimited "`x'", delimiter(";")  case(preserve) clear
        
        qui: gen id = subinstr("`x'", ".csv", "", .)
    
        append using `name'
        
        save `name', replace
        }
    Many thanks

  • #2
    Are any of these CSVs public? If so, give two URLs: one that imports without a problem, and the other with many rows of unwanted info.

    Comment


    • #3
      Hello Jared - I have to correct my original post - I forgot that I had found a workaround for the files that have var names on row 1. So I actually have the same issue when using the code in the original post for both the 1st-row varnames and for the unwanted info/14th-row varnames version CSVs.

      Unfortunately, the CSVs are not public. I could send a dummy example - but I think my institution is preventing me from sharing the URL.
      The unwanted info is only in column 1 and rows 1 to 13. The varnames then sit under this across row 14.

      The workaround involved converting the CSV to .UTF8 format - which successfully imports the first row as variables using the code in the original post. But this isn't ideal (as I'd have to convert lots of files) - and it doesn't work for the other with unwanted info in rows 1/13.

      (Just to clarify - I am not trying to import a combination of these two types of CVSs I have mentioned).

      Thanks

      Comment

      Working...
      X