Merging more than two data sets using macros and foreach loop

Kazi Aiman Udoy

Join Date: Dec 2022

Posts: 33
#1

Merging more than two data sets using macros and foreach loop

12 Dec 2022, 18:24

Hello Everyone,

I want to merge three datasets using macros and foreach loop. The file names are 1.dta, 2.dta & 3.dta. I'm new to stata. I am unable to run the code. I want a detailed process of doing that. I need someone's help.

Thank You
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

12 Dec 2022, 18:31

Please see the FAQ to learn how to ask a question. Welcome to Statalist!!
Comment
Kazi Aiman Udoy

Join Date: Dec 2022

Posts: 33
#3

12 Dec 2022, 19:42

Thanks Jared,

I have run the following code:

global "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\1.dta", clear
foreach i = 2/4 of local loc {
merge 1:1 id using "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\`i'", nogen
drop _merge

but is says "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\1.dta" invalid name

What can I do now?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#4

12 Dec 2022, 21:15

The syntax for the -global- command requires specifying a name of the global macro immediately after the word global, and then spelling out its contents. "F:\UDOY\Graduation...\Stata\1.dta", which occurs immediately after -global- in your code, is not a legal name for a global macro. For one thing it's too long. For another a global macro name can't contain a colon, nor a backslash. Nor can the -global- command take any options like -clear-. It's really not clear what you meant to do there. Maybe you meant you wanted to -use "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\1.dta", clear- ?

That said, there is no apparent reason to create any global macro there in the first place. At least in the code you have shown, you never actually use it. And even if you do use it later in the code, I will wager that you could accomplish the same purpose with a local macro, which is a much safer coding practice. So I recommend you either get rid of that altogether, or if you really need to store that filename somewhere, do it with a local macro.

Once you fix that, you will unmask another problem in your code. Because your -merge- command contains the -nogen- option, there will not be any _merge variable in the -merge-d result, unless there was already a _merge variable in one of the files being combined. And if there is no _merge variable, then -drop _merge- will fail and the code will break there.
Comment
Kazi Aiman Udoy

Join Date: Dec 2022

Posts: 33
#5

13 Dec 2022, 05:44

Thanks for your suggestion Clyde Schechter :

I run the following code:

cd "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\Merging multiple files"
use "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\Merging multiple files\1.dta"
local files : dir "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\Merging multiple files" files "*.dta"

foreach file in `files' {
merge id using `file'
}

It doesn't work in this time also.

It shows the following:
(note: you are using old merge syntax; see [D] merge for new syntax)
master data not sorted
r(5);

Last edited by Kazi Aiman Udoy; 13 Dec 2022, 06:19.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#6

13 Dec 2022, 09:48

When using -merge- you must specify whether the -merge- is 1:1, 1:m, or m:1. As I do not know your data, I cannot advise you which is appropriate for your situation. The simplest case is if all of the files contain exactly one observation for each id, in which case it should be -merge 1:1-. If some of your files contain multiple observations of the same id, however, then you need to carefully order the sequence in which you combine the files, and you will not be able to just use a simple loop like the one you show, since at some point you might have multiple observations per id in both the merged results so far and the using data set. At that point, -merge- is no longer suitable for combining the files at all, and other approaches will need to be taken.

There is another problem that will emerge after you fix that: the -merge- command will (attempt to) create a variable, called _merge, that designates in each observation which data set it came from, or whether it was found in both. But after you get past the first -merge-, the presence of that _merge variable will block the creation of another. So you either need to add the -nogenerate- option to your -merge- command, or you need to follow the -merge- with -drop _merge-.

Finally, although it is not causing you any errors, I recommend you make the code cleaner and clearer. Once you have changed to a given directory, there is no reason to keep mentioning the directory by name when you do I/O operations. So:

Code:

cd "F:\UDOY\Graduation\ECON 7th Semester_UDOY\Research Methodology\RM Class Lectures\NH Stata\Merging multiple files" use 1.dta local files : dir "." files "*.dta" foreach file in `files' { merge 1:1 id using `file' drop _merge }

Finally, let me caution you about doing this at all, for several reasons.
When I see a series of files like this, with names like file1, file2, etc., they are typically files containing the same variables with information about the same ids coming from different time periods or something like that. -merge-ing those files will be a waste of time, as you will end up with just a copy of the first file. When -merge- encounters the same variable in both the master and using files, by default, the values of the variable in the master file are retained and those in the using file are ignored. You can override this default with some options, but if you do that, then you will just end up with a copy of the final file. Files like this are appropriately combined using -append-, not -merge-.

Even if the files are supposed to contain the same variables on the same id's, if there are more than a handful of them, it is likely that there will be some incompatibilities among them: misspelled variable names, different value labels applied, different coding schemes, etc. I recommend using Mark Chatfield's -precombine' program, available from SSC, to search for potential problems before attempting to combine such a series of files, and fix those problems first.

Sometimes the series of files are contain "the same variables" but the variables are deliberately differently named so as to reflect the different time periods (or other characterization) that they represent. In that case, if you -merge- them, you end up with a wide file which is difficult to work with in Stata. You can then -reshape- it. But usually it is simpler to never create the wide file in the first place: rename the variables in each file so that they are the same across files and then combine them with -append-, giving you a crisp, long data set that is readily amenable to analysis in Stata.

The ideal situation for a series of -merge- commands is when you are putting together files that have different variables about the same ids. But, in my experience, in most cases such a series of files is short enough that just writing out two or three separate -merge- commands does the job, and no loop seems warranted. (There are exceptions: NHANES data sets consist of dozens of files with different variables about the same ids. But, in my experience, that kind of data organization is unusual.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#7

13 Dec 2022, 10:04

People interested in this -- especially in contributing -- also need to keep an eye on a thread opened later

https://www.statalist.org/forums/for...-file-in-stata

Kazi: Please don't run multiple threads at once without a really good reason. If you do feel it is a good idea, then post a cross-reference like that above.
Comment
Kazi Aiman Udoy

Join Date: Dec 2022

Posts: 33
#8

13 Dec 2022, 19:08

Clyde Schechter I appreciate your effort. Thanks a lot for your help (specially, for detailed explanation). It worked. You have also mentioned other issues which has to be considered. I will notice them in my future analysis.
Comment
Kazi Aiman Udoy

Join Date: Dec 2022

Posts: 33
#9

13 Dec 2022, 19:12

Nick Cox Dear Nick, it was very urgent for me. I was running out to my submission deadline. Also, I was unable to post the question appropriately in my first case. That's why I posted twice in detail and following the procedure. Finally, I got my answer. I wouldn't do the same further. Thanks for your cooperation.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#10

14 Dec 2022, 03:41

Thanks for the explanation, but we do have advice that you were asked to read before posting. Let's hope your next thread is not urgent
and allows a less frantic approach.

https://www.statalist.org/forums/help

We have set off on one side another document https://www.statalist.org/forums/help#adviceextras covering

Bumping, closing threads, and starting new threads

...

so please also look there if any of those headings looks applicable to your queries.

https://www.statalist.org/forums/help#adviceextras #1 applies, especially

It is better to improve a question than just to ask the same one over again. Start a new thread if and only if you have a different question.
Comment

Announcement

Merging more than two data sets using macros and foreach loop

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment