Observations not appearing when merging datasets

Juan Gonzalex

Join Date: May 2022

Posts: 59
#1

Observations not appearing when merging datasets

18 Jan 2023, 10:11

Hi, I'm doing a research project about M&A and cumulative abnormal returns. I'm trying to combine two datasets. The master dataset is as following:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str9 acquirerticker float(ann_date car3) "KO" 20306 . "KORI" 19680 . "KORS" 21025 . "KR" 19906 -.00834288 "LBRDK" 22095 -.008002553 "LBTYK" 19750 . "LBTYK" 20383 . end format %td ann_date

The using dataset is:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str5 acquirerticker float(ann_date car3) "AT" 21713 .016792996 "KO" 20306 .006925078 "CELG" 20132 -.033690315 "NXPI" 20149 .1653411 "UAN" 20310 -.01679787 end format %td ann_date

In the sample above, because the car3 is missing in the master dataset, I want to merge both to get the car3 from the using dataset. I use the "merge m:m" command, because I have many observations with the same ticker on both datasets. However, the ann_date is unique. When I merge them however, the tickers and date are matched, but the car3 is still missing. I have done the same for another variable and it worked, so I do not understand what I may be doing wrong.

What could be an explanation for this?

Thanks.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

18 Jan 2023, 10:46

Two issues.

First, and far more important, NEVER use -merge m:m-. It produces a meaningless pairing of observations in one data set with the other in almost all circumstances. You probably didn't notice this, but sooner or later the data salad that you got would lead to some ridiculous results. Hopefully that would happen before you presented your findings to somebody else to rely on and you would have the chance to go back and fix it. All -merge-s should be 1:1, m:1, or 1:m. Whenever you think you need m:m, it means you do not understand your data or you do not understand what you are trying to do with your data.

Second, there is the question of why the values of car3 in your second data set are not showing up in the first. That is because, by default, when Stata performs a -merge-, any variables that appear in both data sets are left unchanged. If you want to fill in missing values in the master data set with those found in the using data set, you have to specify the -update- option. And if you want to overwrite non-missing values as well, then you need to specify -update replace-.

Putting these together, the correct command would be:

Code:

use dataset1 merge 1:1 ann_date using dataset2, update
2 likes
Comment

Announcement

Observations not appearing when merging datasets

Comment