Accurate fuzzy name matching in many languages
Names are the linchpin that connect data points in financial compliance, anti-fraud, government intelligence, law enforcement, and identity verification. Yet, names are challenging to connect because of their incredible variation in misspellings, nicknames, initials, and titles. In international databases, a single name may also appear in many languages!
Rosette® Name Indexer (RNI) solves these challenges with a linguistic, knowledge-based system that compares and matches names of people, places, and organizations despite their many variations. RNI is unrivaled in its ability to match names because of its intelligent approach.
As linguistics experts with deep understanding at the intersection of language and technology, Basis Technology continually improves the Rosette product family with language additions, feature updates, and the latest innovations from the academic world. RNI is unrivalled in its ability to match the names of entities—find out how your organization can utilize this pioneering technology for extraordinary results.
- Component of the Rosette SDK
- Simple API
- Fast and scalable
- Industrial-strength support
- Easy Installation
- Flexible and customizable
- Java, C++, or web services
- Unix, Linux, Mac, or PC (64 or 32-bit)
- Matches names of people, places, and organizations
- Increases name search accuracy
- Ranks results by relevancy with a similarity score
- Built to work with Apache™ Solr and Elasticsearch
The Rosette Advantage
Our knowledge-based system combines the latest in Natural Language Processing (NLP) to intelligently match names based on their linguistic and cultural structures and norms.
Unlike expensive and less accurate legacy solutions driven by thousands of spelling variants from known names, RNI analyzes the intrinsic structure of each name component and performs an intelligent comparison using advanced linguistic algorithms.
Our approach is not limited to a particular list of variants and reduces the likelihood of both “false positives” (wrong matches) and “false negatives” (missed matches).
List driven systems cannot equal RNI for matching never-seen-before names or mis-segmented names (Mary Ellen vs. MaryEllen).
Rosette® Name Indexer integrates easily into Apache Solr™ as a plug-in or into applications as a Java library to support its main use cases. RNI can also be adapted to match the needs of each application.
Apache Solr™-based search systems can easily add high-quality fuzzy name matching to every search by simply adding name fields. RNI provides a special Solr field type for names. This mechanism means Solr can index documents with multiple name fields, each with multiple values (e.g., an “alias” field may contain more than one name). Each document could also contain non-name fields like dates or plain text.
<field name=”primary”>Muhammad Ali</field>
<field name=”alias”>Cassius Clay Jr</field>
<field name=”alias”>The Greatest</field>
A single query can then be constructed that gives different weight to the various fields. For example, a single query can find movies starring “Binedict Cumberbund” with screenplays by “Giyermo Diltoro” that were released around 2014.
Any application that needs name matching can directly integrate a Java library which takes care of storing watchlists without incurring the overhead of a web-service call.
Financial institutions use RNI to manage and update watchlists to block terrorist access to funds, simultaneously avoiding compliance violations and protecting their reputation. Applications also include fraud detection, money laundering, and document triage.
Identity Verification in the Sharing Economy
Trust is foundational to the sharing economy. Whether booking room rentals, rides, or odd jobs, it is important to establish ways to connect the online and offline worlds to reinforce that trust and confidence.
Name matching is a key component of verifying online identities with real-world documentation (passports, driver’s licenses). Members of the sharing economy such as Airbnb rely on RNI to match names originating from all over the world, and internationally between names written in alphabets besides the Roman A-to-Z.
- Set the minimum threshold of the similarity score to manage the precision and recall of the returned search results.
- Ignore a given list of words (“stopwords”) with respect to matching (e.g., titles, honorifics).
- Force two name words to always match with a given score (e.g., “Elizabeth” and “Lisbeth” always match at 90%).
- Force two names to always match with a given score (e.g., “John Doe” and “Joe Bloggs” always match at 95%).
- Link multiple names to a single individual (e.g., queries for “Marilyn Monroe” and “Norma Jeane Mortensen” include the same person).