Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Observations not appearing when merging datasets

    Hi, I'm doing a research project about M&A and cumulative abnormal returns. I'm trying to combine two datasets. The master dataset is as following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 acquirerticker float(ann_date car3)
    "KO"    20306           .
    "KORI"  19680           .
    "KORS"  21025           .
    "KR"    19906  -.00834288
    "LBRDK" 22095 -.008002553
    "LBTYK" 19750           .
    "LBTYK" 20383           .
    end
    format %td ann_date
    The using dataset is:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 acquirerticker float(ann_date car3)
    "AT"   21713  .016792996
    "KO"   20306  .006925078
    "CELG" 20132 -.033690315
    "NXPI" 20149    .1653411
    "UAN"  20310  -.01679787
    end
    format %td ann_date
    In the sample above, because the car3 is missing in the master dataset, I want to merge both to get the car3 from the using dataset. I use the "merge m:m" command, because I have many observations with the same ticker on both datasets. However, the ann_date is unique. When I merge them however, the tickers and date are matched, but the car3 is still missing. I have done the same for another variable and it worked, so I do not understand what I may be doing wrong.

    What could be an explanation for this?

    Thanks.

  • #2
    Two issues.

    First, and far more important, NEVER use -merge m:m-. It produces a meaningless pairing of observations in one data set with the other in almost all circumstances. You probably didn't notice this, but sooner or later the data salad that you got would lead to some ridiculous results. Hopefully that would happen before you presented your findings to somebody else to rely on and you would have the chance to go back and fix it. All -merge-s should be 1:1, m:1, or 1:m. Whenever you think you need m:m, it means you do not understand your data or you do not understand what you are trying to do with your data.

    Second, there is the question of why the values of car3 in your second data set are not showing up in the first. That is because, by default, when Stata performs a -merge-, any variables that appear in both data sets are left unchanged. If you want to fill in missing values in the master data set with those found in the using data set, you have to specify the -update- option. And if you want to overwrite non-missing values as well, then you need to specify -update replace-.

    Putting these together, the correct command would be:
    Code:
    use dataset1
    merge 1:1 ann_date using dataset2, update

    Comment

    Working...
    X