Issues with Merging Datasets

Alex Everett

Join Date: Feb 2022

Posts: 20
#1

Issues with Merging Datasets

14 Feb 2022, 10:36

Hi all,

I'm dealing with the Tanzanian National Panel Survey which has a lot of data sets I need to merge. However I have one key issue when merging. Using the household identification number (and the individual id number where appropriate) I merged m:m. The issue I came across was because of a specific question in the questionnaire that asked about livestock owned.

As shown below the merge created a duplication of an individual id 11 times. As we can see the data represents one household with 6 members who own 14 chickens. How can I avoid this? I need all this info and more in a single data set to run the regressions I want. From this I also want to calculate the Tropical Livestock Unit but that's another story.

The problem shown in the data:

If you need any more info just let me know.

Thanks,
Alex
Attached Files
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

14 Feb 2022, 12:26

You should not have used merge m:m - that is the source of your problem.

The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu.

m:m merges

m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.

So even the Stata documentation asserts that merge m:m is almost never appropriate. If you are thinking about using merge m:m, it is a near certainty that at least one of the following is true:

1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

4. You actually need to append your datasets rather than merge them.

5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

14 Feb 2022, 12:35

It seems to me that your animal ownership data had 14 observations of animals for each household, and your example household had 5 individuals.

Is the idea that you want each individuals observation to reflect the ownership of each of these 14 kinds of animals? If so, then what you should do is first use the reshape wide command on your animal data so that it has one observation for each household, with 14 variables indicating the number of each type of animal owned, and perhaps 14 more variables indicating whether each type of animal was owned. Then you could use merge m:1 to merge each individual in each household to the households single observation giving its animal ownership data.
Comment
Alex Everett

Join Date: Feb 2022

Posts: 20
#4

15 Feb 2022, 09:42

Thank you for your reply William, took me a while to implement but it's working now thank you.
Comment

Announcement

Issues with Merging Datasets

Comment

Comment

Comment