is -matchit- just really slow? And can anybody see anything wrong with my reclink syntax?

Julio Raffo

Join Date: May 2014
Posts: 131

#16

24 Aug 2021, 12:03

Hi Marc,

I'm a little bit confused by your question. Why do you need to include the year and GVKEY into the fuzzy match? Do you think there might be typos in these variables?

If this is not the case I suggest the following strategy:

1. keep just unique GVKEY and name pairs from both files, join them by gvkey
2. Run matchit using the column syntax
3. Drop the "bad" matches (manual inspection is recommended)
4. Merge back the resulting file with master (or smaller file)
5. Merge back with the using (or larger file) adding the year as additional merge condition.

I think this is the fastest way to do it. The code below implements this choice assuming your files are named master.dta and using.dta:

Code:

tempfile masterunique
use master.dta, clear
keep CFO_Name Acq_ID_Compustat
rename Acq_ID_Compustat GVKEY
duplicates drop
save `masterunique'

use using.dta, clear
keep GVKEY Director_Name
duplicates drop
joinby GVKEY using `masterunique'

matchit Director_Name CFO_Name, score(minsimple)
// matchit Director_Name CFO_Name, w(log) g(similwgt) score(minsimple) // if not to big use this one

// use this to check the data
gsort GVKEY -similscore
br if similscore>.2

keep if similscore>.7 // check first if this threshold makes sense to your data

// merge back with master file
gen Acq_ID_Compustat=GVKEY
joinby Acq_ID_Compustat CFO_Name using master.dta  

// merge back with using file (BUT ONLY ON THE MATCHING YEARS)
gen YEAR=Deal_Announced
joinby GVKEY Director_Name YEAR using using.dta  

save megamerge.dta

Announcement

Comment