Hello all,
For my Master Thesis research, I need to fuzzy match two datasets based on company names. To make this process easier, I am cleaning up the names, e.g. by removing generic terms like 'Limited', 'Ltd.', 'Co.' at the end of the company names. I am using the following code to do so:
I find that some companies have multiple generic words at the end of their names:
I first transform all strings into lower case and clean up all punctuation. After I apply the code above, only the last generic term is removed, where I would like all generic terms at the end of the string to be deleted. My question is: how can I change the above code in order to remove all of these terms at the end of the string, and not only the absolute last term?
With kind regards, and thank you in advance,
Christian Spek
For my Master Thesis research, I need to fuzzy match two datasets based on company names. To make this process easier, I am cleaning up the names, e.g. by removing generic terms like 'Limited', 'Ltd.', 'Co.' at the end of the company names. I am using the following code to do so:
Code:
local to_remove ltd limited inc llc co corp corporation gmbh ag nv bv international int holding pjsc sa se spa plc incorporated holdings aktiengesellschaft coltd as group groep groupe sa/nv gen rcompanynamelow_clean = reverse(companynamelow_clean) foreach t of local to_remove { local trev = reverse(`"`t'"') replace companynamelow_clean = reverse(subinword(rcompanynamelow_clean, `"`trev'"', "", 1)) /// if strpos(rcompanynamelow_clean, `"`trev'"') == 1 } drop rcompanynamelow_clean
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str100 companyname "China Oriental Group Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." "West Fraser Timber Co. Ltd." end
With kind regards, and thank you in advance,
Christian Spek
Comment