Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Marc,

    I'm a little bit confused by your question. Why do you need to include the year and GVKEY into the fuzzy match? Do you think there might be typos in these variables?

    If this is not the case I suggest the following strategy:

    1. keep just unique GVKEY and name pairs from both files, join them by gvkey
    2. Run matchit using the column syntax
    3. Drop the "bad" matches (manual inspection is recommended)
    4. Merge back the resulting file with master (or smaller file)
    5. Merge back with the using (or larger file) adding the year as additional merge condition.

    I think this is the fastest way to do it. The code below implements this choice assuming your files are named master.dta and using.dta:
    Code:
    tempfile masterunique
    use master.dta, clear
    keep CFO_Name Acq_ID_Compustat
    rename Acq_ID_Compustat GVKEY
    duplicates drop
    save `masterunique'
    
    use using.dta, clear
    keep GVKEY Director_Name
    duplicates drop
    joinby GVKEY using `masterunique'
    
    matchit Director_Name CFO_Name, score(minsimple)
    // matchit Director_Name CFO_Name, w(log) g(similwgt) score(minsimple) // if not to big use this one
    
    // use this to check the data
    gsort GVKEY -similscore
    br if similscore>.2
    
    keep if similscore>.7 // check first if this threshold makes sense to your data
    
    // merge back with master file
    gen Acq_ID_Compustat=GVKEY
    joinby Acq_ID_Compustat CFO_Name using master.dta  
    
    // merge back with using file (BUT ONLY ON THE MATCHING YEARS)
    gen YEAR=Deal_Announced
    joinby GVKEY Director_Name YEAR using using.dta  
    
    save megamerge.dta

    Comment

    Working...
    X