Loops and saving files

Giorgio Depolli

Join Date: Jan 2021

Posts: 21
#1

Loops and saving files

29 Jan 2021, 11:36

Dear fellow Stata users,

I am trying to apply same commands to different datasets. I am currently working on 4 quarterly datasets. Using

Code:
cd "yourfolder" local files: dir . files "*dta" foreach file of local files { use "`file'", clear //do something }
However, when I press saving after this, STATA only saves results from the last quarter. But, I can see by looking at deleted observations, real changes etc that commands apply to all 4 quarters but I am not sure how to save it all. What should I add to my command?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#2

29 Jan 2021, 12:07

Your description isn't entirely clear to me. But I think I know what you are trying to do.

Here's what you've actually done: You have read in all the *dta files in "yourfolder", done some calculations with them, and then, in every case but the last, you have overwritten all of that work by reading in the next file. Only the last file does not get overwritten, so its results are subsequently saved when you "press saving."

What I think you want to do is build up a file that retain all the results along the way. Now, it is not entirely clear whether you want to build that file up "side by side" or "vertically." I'm going to assume you want to stack the results from the four files "on top of each other."

Code:

cd "yourfolder" local files: dir "." files "*dta" tempfile building clear save `building', emptyok foreach file of local files { use `"`file'"', clear // do something append using `building' save `"`building'"', replace }

At the end of this code you will have in memory the results for all of the files you read in, stacked "vertically." You can then do further operations on this data, or you can now save it with a real filename. (Do not now save it as `building', because `building' will automatically self-destruct after the do-file finishes running. Give it a real, permanent, file name.)

Note: Not tested, beware of typos or other errors.

If you need to build this file "side by side" instead of vertically stacked, the code is different. It will involve using -merge- instead of -append-, but it also will require some other modifications because -appending- an empty file is legal in Stata, but -merge-ing an empty file is impossible. If you are in this position and can't figure out the modifications needed, post back and I'll be happy to show you.

Last edited by Clyde Schechter; 29 Jan 2021, 12:09.
2 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

29 Jan 2021, 12:36

Note that this topic was previously posted and responded to at

https://www.statalist.org/forums/for...tiple-datasets

which is where the code in post #1 originated.
Comment
Giorgio Depolli

Join Date: Jan 2021

Posts: 21
#4

29 Jan 2021, 12:46

Thank you, Clyde, your code worked perfectly!
Comment
Elena Draghici

Join Date: Jan 2021

Posts: 43
#5

29 Jan 2021, 13:27

Hello,

I am new to Stata (especially loops) and I am hoping someone could help me understand what I am doing incorrectly and provide some suggestions.

I work with very large datasets that require sampling by year to build a master dataset. So far, I have been doing this manually (sampling by year over 18+ years years, saving, then appending-very time consuming). I want to learn how to use loops to do them for me: I want Stata to access each year of a dataset, sample 20%, keep a list of variables, then save it as "sample 20". Then, if possible, I would also like to have the datasets appended together after they have been saved separately. I tried to work with the code provided above, but I keep getting errors ('no variables defined).

Code:

cd "C:\Users\Elena\Desktop\practice for rdc\New folder"
local files: dir "." files "*dta"
tempfile sample
clear
save `sample'
foreach file of local files {
use `"`fullsamp*'"', clear
sample 20
keep personid year prov marst
append using `fullsamp*'
save `"`master'"', replace
}

Thank you in advance for the help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#6

29 Jan 2021, 14:48

Well, the error message you are getting comes from the -save `sample'- command. There is nothing in active memory at the time this command is reached, and Stata does not permit you to save an empty data file unless you specify the -emptyok- option. So if you add that to that command, this error message will go away.

But there are other errors further along in your code that will trip you up as well. -use `"`fullsamp*'"', clear- will give you an error message: you can only use one file at a time, not a whole list of files abbreviated by wildcards. Also, at least in the segment of code you show, local macro fullsamp is never defined. Similarly, -save `"`master'"', replace- is a problem because you have never defined local macro master. However, this time you will not get an error message. Instead, you will overwrite the last previously loaded file--which is not what you intend and will cause you to lose the data that was in it.

Also, your problem is ill posed. You cannot save each 20% sample separately under the same filename "sample 20." If you want each sample saved separately, you must give each one a separate filename.

Code:

cd "C:\Users\Elena\Desktop\practice for rdc\New folder" local files: dir "." files "*dta" tempfile combined clear save `combined', emptyok foreach f of local files { use `"`f'"', clear sample 20, by(year) keep personid year prov marst save `"`f'_20"', replace gen source_file = `"`f'"' append using `combined' save `"`combined'"', replace } save combined_20_pct_samples, replace

Note: If you want this code to give reproducible results every time it is run, you should include a -set seed 1234- (or whatever seed number you like) at the top.
Comment
Elena Draghici

Join Date: Jan 2021

Posts: 43
#7

29 Jan 2021, 15:09

Hi Clyde,

Thank you so much for your answer! Everything worked, except the saving each year of the sample 20. You mention this above and state that setting a seed will ensure a reproducible result instead of saving each file? As in, if I needed to recreate the dataset for whatever reason, I would just re-run the code instead of appending the years together.

If I was to save each file, would that require a second loop inside the current one?

Thank you again!
Comment
Elena Draghici

Join Date: Jan 2021

Posts: 43
#8

29 Jan 2021, 15:50

Also, does the line of code 'sample 20, by(year)' refer to Stata accessing each dataset in the specified file location, which is split by year? Or is it sampling by year within a dataset?

Sorry for all the questions, I am very new at this!

Last edited by Elena Draghici; 29 Jan 2021, 15:58.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#9

29 Jan 2021, 16:04

Thank you so much for your answer! Everything worked, except the saving each year of the sample 20.

I do not see why that didn't work. If you originally had a file called xyz.dta, after the code runs there should be a new file, xyz_20.dta, which contains the 20% sample (and xyz.dta will still be there, too). Are there no such files after you ran the code?

If I was to save each file, would that require a second loop inside the current one?

No, the code is already in that loop, and I can't see any reason it wouldn't have done the job. See response just above.

As in, if I needed to recreate the dataset for whatever reason, I would just re-run the code instead of appending the years together.

Yes, that's correct.

Also, does the line of code 'sample 20, by(year)' refer to Stata accessing each dataset in the specified file location, which is split by year? Or is it sampling by year within a dataset?

It's sampling by year within a data set. Once Stata is inside that loop, all the commands apply only to the one data set that is currently being processed at the time.
Comment
Elena Draghici

Join Date: Jan 2021

Posts: 43
#10

29 Jan 2021, 16:36

My bad, you are correct. All the sampled years are there along with the appended file

Thanks so much, Clyde!
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#11

14 Oct 2021, 16:56

Originally posted by Clyde Schechter View Post

Your description isn't entirely clear to me. But I think I know what you are trying to do.

Here's what you've actually done: You have read in all the *dta files in "yourfolder", done some calculations with them, and then, in every case but the last, you have overwritten all of that work by reading in the next file. Only the last file does not get overwritten, so its results are subsequently saved when you "press saving."

What I think you want to do is build up a file that retain all the results along the way. Now, it is not entirely clear whether you want to build that file up "side by side" or "vertically." I'm going to assume you want to stack the results from the four files "on top of each other."

Code:

cd "yourfolder" local files: dir "." files "*dta" tempfile building clear save `building', emptyok foreach file of local files { use `"`file'"', clear // do something append using `building' save `"`building'"', replace }

At the end of this code you will have in memory the results for all of the files you read in, stacked "vertically." You can then do further operations on this data, or you can now save it with a real filename. (Do not now save it as `building', because `building' will automatically self-destruct after the do-file finishes running. Give it a real, permanent, file name.)

Note: Not tested, beware of typos or other errors.

If you need to build this file "side by side" instead of vertically stacked, the code is different. It will involve using -merge- instead of -append-, but it also will require some other modifications because -appending- an empty file is legal in Stata, but -merge-ing an empty file is impossible. If you are in this position and can't figure out the modifications needed, post back and I'll be happy to show you.

Is it possible to do multiple folder operations inside a loop, like below:

Code:

foreach file of local files { use `"`file'"', clear // do something1 save `"`building'"', replace // do something2 save `"`building'"', replace }

I have a lot of datasets for which first I have to remove some variables, and then rename, and so on.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#12

14 Oct 2021, 17:01

I don't understand the question. The code you show has nothing to do with multiple folders. You can, of course, do multiple things inside the loop and if you there is some reason to save the results part way through, and then later overwrite those results after you complete the tasks, well, yes, you can do that just as you show.
1 like
Comment

Announcement

Loops and saving files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment