Matching 2 databases on industry (SIC 2-digit code) and size (total assets) for a study on audit fees

Robin Verboom

Join Date: Nov 2018

Posts: 2
#1

Matching 2 databases on industry (SIC 2-digit code) and size (total assets) for a study on audit fees

22 Nov 2018, 04:31

Hallo, I currently have a sample of 143 public firms that were involved in Merger and Acquisition (M&A) behavior in 2016. I am studying the effect of this merger and acquisition behavior on Audit Fees.
I have created 2 databases of listed firms, one with the 143 public firms that were associated with M&A behavior and one with 2478 public firms NOT associated with M&A behavior. Both of these databases have Ticker Symbol as company identifier.

I would like to match the 2 databases, so that every public firm with M&A behavior has 1 uniquely "matched" (without replacement) observervation, ultimately doubling my M&A database with non-M&A observervations.
I want them to match having equal 2-digit SIC codes (Standard Industry Classification Codes) and as close as possible Total assets while still keeping all my variables from my databases (like audit fees, auditor, audit opinon, total inventories etc).

Is there any way this can actually be done?
Someone suggested I tried using "Calipmatch" which I have installed using ssc install calipmatch, but I can't figure out how it actually works.

I've tried the following code: (Note AT = assets total, acquirortic is the company identifyer/tickler)

Code:

use "C:\Users\Gebruiker\Desktop\Thesis samples\2016 M&A final.dta", clear rename at case_at rename acquirortic case_acquirortic joinby sic2 using "C:\Users\Gebruiker\Desktop\Thesis samples\matching sample 2017.dta" gen delta = abs(at - case_at) drop if delta >= 500 drop if delta < 1 set seed 15 gen double shuffle1 = runiform() gen double shuffle2 = runiform() by acquirortic (delta shuffle1 shuffle2), sort: keep if _n == 1 by case_acquirortic (delta shuffle1 shuffle2), sort: keep if _n == 1 drop delta shuffle1 shuffle2

This does appear to be matching the 2 databases, but gets rid of the majority of my secondary database's variables and does not create new observations but just places them in the columns behind the current observations.

Any suggestions/help/code which helps me improve this? I hope my question is clear, any help would be appreciated.

Thank you in advance for any help!

Robin

Last edited by Robin Verboom; 22 Nov 2018, 04:35. Reason: Terrible layout
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

22 Nov 2018, 10:22

Starting a new thread is not a good way to get help. I did not give more specific advice before because you did not offer any sample data using the -dataex- command, as the FAQ strongly suggests. That would have made it easy to help you out. (Not very many people can explain their problems very clearly without sample data.)

That difficulty being recognized, here's a suggestion of how -calipmatch- would work in your context. This code presumes you have *one* data file for both merger and nonmerger observations ("cases" and "controls" in the language of the help for -calipmatch-), with a variable MACase = 1 for merger observations, MACase = 0 for nonmergerd, and SIC being the SIC code. You can put together one file from your two files using the built-in Stata command -append-.

The following usage of -calipmatch- will create a new variable "pairid" that will identify pairs of observations that: 1) Differ on MACase; 2) Have the same SIC code; 3) Have total assets that differ by no more than +/- 500.

Code:

calipmatch, generate(pairid) casevar(MACase) exactmatch(SIC) maxmatches(1) calipermatch(at) calpiperwidth(500)
Comment
Abdul Ghafoor

Join Date: May 2017

Posts: 15
#3

30 Sep 2020, 08:16

Dear Mike Lacy, May I know how do we get a dummy variable of matches after calipmatch matching? thanks
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#4

30 Sep 2020, 08:47

In the future please don't address questions on StataList to particular individuals. StataList is an open forum, with questions to be addressed to anyone who might answer them. Your way of asking a question treatts people who answer questions as though they were employees of a commercial help service.

Setting that issue aside: I don't understand what you want this "dummy" variable to indicate about the matching process. I should think that, in this case, the variable "pairid" has all the information about matches. Perhaps someone else in the StataList community will understand what you want and be able to answer your question.
Comment

Announcement

Matching 2 databases on industry (SIC 2-digit code) and size (total assets) for a study on audit fees

Comment

Comment

Comment