Taking strings out of file names

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

Taking strings out of file names

27 Mar 2022, 17:10

I scraped together some xls files using the almighty Python. Before I can/want to work with these files I must rename them, because this code

Code:

foreach f of loc files {
    
import delim "`f'", clear

sa "`f'", replace
}

ruins the files.

This code gives me

Code:

cls


local files : dir "`c(pwd)'" files "*Dados*"

foreach f of loc files {
    
di "`f'"

}
dadosbo_2018_1(homicÍdio doloso).xls
dadosbo_2018_10(homicÍdio doloso).xls
dadosbo_2018_11(homicÍdio doloso).xls
dadosbo_2018_12(homicÍdio doloso).xls
dadosbo_2018_2(homicÍdio doloso).xls
dadosbo_2018_3(homicÍdio doloso).xls
dadosbo_2018_4(homicÍdio doloso).xls
dadosbo_2018_5(homicÍdio doloso).xls
dadosbo_2018_6(homicÍdio doloso).xls
dadosbo_2018_7(homicÍdio doloso).xls
dadosbo_2018_8(homicÍdio doloso).xls
dadosbo_2018_9(homicÍdio doloso).xls

What I want, is to mass rename all of these by removing "dadosbo_" and "(homicÍdio doloso)" from the file names. I know how to rename them individually, but how could I only remove some strings from these file names?

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

27 Mar 2022, 17:18

One thing confuses me. These are .xls files but you are trying to import them with -import delim-? That's not right. I'll assume you meant -import excel-. I wold do this as

Code:

forvalues i = 1/12 { import excel dadosbo_2018_`i'(homicÍdio doloso).xls, clear // perhaps other options save 2018_`i', replace }
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#3

27 Mar 2022, 17:32

When I try import exc, Stata tells me "It is not as xlsx file. Use the xlCreateBook() for an xls file." No idea what this means or even what it refers to, but Import delim however works on my end. For an example, when I try this code block, I get a 603 error.

Code:

import exc "E:\Papers\Brazil\Raw Data\DadosBO_2019_7(HOMICÍDIO DOLOSO)", clear file E:\Papers\Brazil\Raw Data/DadosBO_2019_7(HOMICÍDIO DOLOSO).xls could not be loaded

You don't have to of course, but you can go to the link, click on "homicidios dolosos", and download a random file. Never seen this error before.

Either way though, I'll have files for multiple years and months. Python's still grabbing them, but I'll have files which are named things like "dadosbo_2019_9(homicÍdio doloso).xls", "dadosbo_2021_2(homicÍdio doloso).xls". So, I would either need to use a double loop for the years too, or rename the files and save them. To me, the cleanest solution is to rename them, but I'll use the double loop presuming there's no easy way to get rid of the unneeded strings.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

27 Mar 2022, 18:57

Let's share more of the details for the benefit of others. When you import your xls-suffixed file as text Stata tells us

Code:

. import delimited "/Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÃDIO DOLOSO).xls" (encoding automatically selected: UTF-16LE) (54 vars, 709 obs)

(Note that the non-ASCII character in the filename is not rendered correctly in the forum software,)

The file is not in fact in Excel XLS format - it is a text delimited file encoded in UTF-16LE with a meaningless suffix of xls.

Now, in post #1 you tell us your code ruins the files. What exactly does that mean?

Stata saves the imported Stata dataset with the specified xls suffix rather than a dta suffix, and thus overwrites the xls input file.

You need to strip off the .xls from the end of the filename before saving it.

Code:

. local f "/Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÃDIO DOLOSO).xls" . import delimited "`f'", clear (encoding automatically selected: UTF-16LE) (54 vars, 709 obs) . local o : subinstr local f ".xls" "" . save "`o'", replace (file /Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÃDIO DOLOSO).dta not found) file /Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÃDIO DOLOSO).dta saved .

And the xls file is left intact.

The same macro subinstr function will allow you to remove other components of the input file names when creating the output file names.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

27 Mar 2022, 18:58

I ended up doing

Code:

foreach x of loc files {
loc a: di strlen("`x'")

if `a' == 37 {
    
local subfile = substr("`x'", 9, 14) + ".xls"

    !rename "`x'" "`subfile'"
}

else if `a'==38 {
    
local subfile = substr("`x'", 9, 15) + ".xls"

    !rename "`x'" "`subfile'"
    
    
}
}

It still keeps the "(HOMICÍDIO" string, but it's an improvement from before. Although I'm sure there's a better way to do this.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#6

27 Mar 2022, 20:21

Here is a more robust way to handle the file renaming within Stata, rather than say using the shell to handle this (which in Linux and mac would be maybe more efficient, but I'm not aware of a clean command line-based solution for Windows).

The code presupposes that you have one root directory where your files are located. Inside this directory are two key folders: one to hold the existing (old) files, and one the renamed (new) files. The old file names are first pattern matched using regex and new file names derived based on the pattern (plus whatever other processing you desire). The the files are copied to the new file directory with the new name. Optionally, you may erase the old file.

Here I adopt the unimaginative, but easy, pattern of renaming files with the pattern old_*.txt to new_*.csv

Code:

cd "/path/to/your/file/root" local oldfiles : dir "./old/" files "*.txt" , respectcase mac list _oldfiles foreach fn of local oldfiles { local oldfn `"`fn'"' local res = ustrregexm(`"`oldfn'"', "old_(.*)\.txt", .) if `res'==1 { local newfn = "new_" + ustrregexs(1) + ".csv" di `"Renaming `oldfn' to `newfn'"' copy `"./old/`oldfn'"' `"./new/`newfn'"', public * erase `"./old/`oldfn'"' // CAUTION: optionally, erase the old files } }

After this point, you can go on to import each file into Stata using -import delimited-.

Last edited by Leonardo Guizzetti; 27 Mar 2022, 20:23.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

27 Mar 2022, 20:54

The solution I ended up using was

Code:

local files : dir "`c(pwd)'" files "*dadosbo_*"
cls

foreach x of loc files {
local subfile = subinstr("`x'", "(homicÍdio doloso)", "", .)

    !ren "`x'" "`subfile'"
}

local files : dir "`c(pwd)'" files "*dadosbo_*"
cls
foreach x of loc files {
local subfile = subinstr("`x'", "dadosbo_", "", .)

    !ren "`x'" "`subfile'"
}

It's sort of unclean, but it does the job. Presumably, there's a more elegant solution which exists. I imagine I could simplify it by using the ustrregexs function..

Announcement

Taking strings out of file names

Comment

Comment

Comment

Comment

Comment

Comment