The need for name matching has moved far beyond de-duplicating mailing lists to save a few dollars on mailing one less catalog. Today, in financial compliance, law enforcement, and homeland security, the costs are much higher. The costs of a false positive—a wrong match—are the time and money wasted by people reviewing the erroneous matches, inconveniencing innocent people, or embarrassing customers. The costs of a false negative—a missed match—are the increased risks of reputation damage, regulatory fines, and known risky individuals going unchecked.
This whitepaper explores four name matching methods which are applied to these high-stake situations: common key (e.g., Soundex), lists of name variations, edit distance, and statistical similarity. We will examine the strengths and weaknesses of each method with respect to matching names written in both English and other languages, and conclude with the current “best practice” among name matching technologies.