Loop over list of files and append but first item in list is empty

Adithya Raajkumar

Join Date: Nov 2020

Posts: 7
#1

Loop over list of files and append but first item in list is empty

19 Sep 2022, 09:15

Hello,

I am trying to obtain a list of files matching a wildcard expression and append them all together. The list of files is long, so I would like to use a loop.
I have figured out two ways of getting macro with a list of files:

Code:

local filelist: dir dirname files "*pattern*.dta" //1 ssc install fs fs dirname/*pattern*.dta //2

The first produces a local named filelist and the second produces a local r(files). I can print the entire list using --di ""`filelist'""--

I can print the list of files one by one successfully:

Code:

.foreach file in ""`filelist'"" { 2. di "`file'" 3. } file1.dta file2.dta file3.dta }

However, I cannot use --append--, and I tried using quotes and no quotes around `file':

Code:

. clear . foreach file in ""`filelist'"" { 2. append using ""`file'"" 3. } file not found r(601); . foreach file in ""`filelist'"" { 2. append using `file' 3. } invalid file specification r(198);

What am I doing wrong? I suspect that the first entry in `filelist' is actually an empty string, but I don't know how to get rid of this. For now I have found a workaround using

Code:

if "`file'" != "" { append using "`file'" }

inside the loop, but this seems rather inelegant.
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1320
#2

19 Sep 2022, 09:56

I think your problem is with the use of two sets of double-quotes. When Stata sees "", it thinks you mean an empty string. So the first element of the loop becomes the null string, and you have no file with that name. This causes no problem with the display command (the first line of the display is just empty), but it does create a problem with append, when it actually looks for a file whose name is just the empty string.

Try this instead:

Code:

foreach file in `"`filelist'"' { append using `"`file'"' }

Last edited by Hemanshu Kumar; 19 Sep 2022, 09:59.
1 like
Comment
Adithya Raajkumar

Join Date: Nov 2020

Posts: 7
#3

08 Oct 2022, 18:33

Hi Hemanshu, I tried using single quotes, but the problem persists:

Code:

. foreach file in "`filelist'" { 2. di "`file'" 3. } file1.dta file2.dta file3.dta .

So the problem of nulls still exists. Moreover, when I use -set trace on-, it actually seems that Stata is inserting extra quotes into the files which are stripped out by -di-. Unfortunately, this continues to cause headaches for me.
Using your solution, the macro puts all the files into a single string, which strips out the empty results but is not what I want:

Code:

. foreach file in `"filelist'"' { 2. di `"`file'"' 3. } "file1.dta" "file2.dta" "file3.dta" .
Comment
Adithya Raajkumar

Join Date: Nov 2020

Posts: 7
#4

08 Oct 2022, 18:37

Using no quotes at all seems to have worked, which is exceedingly strange given that usually Stata will interpret that as a reference to a variable:

Code:

. foreach file in `filelist' { 2. di "`file'" 3. } file1.dta file2.dta file3.dta .

I swear, it seems like Stata has 2459874508435 different hiccups when it comes to strings.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

08 Oct 2022, 18:46

There is a difference between a string as a variable and string as a macro.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#6

08 Oct 2022, 19:15

A couple of asides:

1. You don't need a loop to append all these files together. If you have verified that they are suitable for append (see 2. below), just a single -append using `filelist'- command will do the job. And if you want to keep track of which file each observation in the result came from, -append- has a -generate- option for that purpose.

2. That said, mass -append- often ends in tears. Even if your data sets are coming from reliable sources that curate their data sets carefully and expertly, if the data sets number more than a handful, the probability is high that there will be incompatibilities among them that result in a malformed result or appreciable loss of data when you mass append them (whether through a single command as I suggested above, or through a loop). What should be the same variable may have slightly different names in different data sets (differences in case are common, as are digit reversals, and spelling errors are hardly rare). Or two variables with the same name in different data sets may be string in one data set and numeric in another. Or the variable may be numeric in both but have different value labels.

I highly recommend that you install Mark Chatfield's -precombine-, available from SSC, and use it to screen your data sets for these problems ahead of time. It will point out any potential problems, and then you can fix those problems before you run -append-. I can tell you from painful experience that this preventive approach is far more effective and efficient than trying to clean up the grotesque and defective dataset that can result from just blindly going ahead with mass -append-. Worse still, the results of mass -append- can be seriously wrong in ways that are not obvious, and you may not realize you are working with incorrect data until far down the line, necessitating a lot of rework,or possibly not even until you have presented your results to somebody who will take actions relying on them.
1 like
Comment

Announcement

Loop over list of files and append but first item in list is empty

Comment

Comment

Comment

Comment

Comment