I wouldn't recommend keeping only the first 3-5 letters with drug names because that is likely to result in two different drug names being reduced to the same thing. I would first save a few characters by replacing all double-underscores with single underscores, and then truncate the whole thing at 31 characters. It is far less likely that this will lead to two different things being collapsed to the same. (But I would check if it does!)
If you get an error on the -assert- command, then you have some drug names that agree on their first 32 characters. In that case, some different approach will be needed. Just how to do that would depend on the details of the offending drug names.
Code:
replace rxddrug = strtoname(rxddrug) replace rxddrug = subinstr(rxddrug, "__", "_", .) replace rxddrug = substr(rsddrug, 1, 31) by rxddrug (drug_variable), sort: assert drug_variable[1] == drug_variable[_N] drop drug_variable drop if missing( rxddrug) quietly levelsof rxddrug, local(drugs) clean
Comment