Entity disambiguation in dissimilar data sets: A new approach

Better insight with less effort, at reduced cost. It’s what every organization wants from its data. New BasisTech research may help organizations obtain that insight. The research charts innovative ways to more efficiently connect entities appearing in dissimilar data streams.

The goal of the research, conducted by BasisTech Senior Research Engineer Philip Blair and Chief Scientist Kfir Bar, is to connect outside entities with entities that already appear in an organization’s existing knowledge base. These connections are vitally important to anti-fraud efforts, government intelligence, law enforcement, and general business processes.

Blair and Bar’s research will be presented at the prestigious 2022 Conference on Empirical Methods in Natural Language Processing, to be held December 7-11 in Abu Dhabi. Their research will also appear in ACL Anthology, a publication of the Association for Computational Linguistics.

In their work, Blair and Bar applied research on pattern-based low-resource text classifiers to the problem of entity linking. Using this approach, they demonstrate a system which can be trained on generalized news data and subsequently tuned to work on mental health news using a very small amount of data.

Their results were encouraging. The pair found that while their work performed about as well as expected in aligning entities in generalized news data sets, it demonstrably outperformed baseline expectations when matching entities in the medical domain.

“Our research has uncovered potential ways to reduce the amount of manual effort and monetary resources that are required to precisely identify entities across a variety of disparate types of text,” Blair said.

About BasisTech

Data analytics and machine learning are critical to verifying identity, understanding customers, anticipating world events, and uncovering crime. BasisTech provides businesses and governments with advanced analytics and AI-powered solutions for deriving insights from multilingual text, connecting data silos, and discovering digital evidence. Our Rosette text analytics platform employs classical machine learning and deep neural nets to extract meaningful information from unstructured data. Autopsy, our digital forensics platform, and Cyber Triage, our incident response tool, serve the needs of law enforcement, national security, and legal technologists. KonaSearch delivers deep search across Salesforce and other data sources.

Company headquarters are in Somerville, Mass., with offices in Washington, D.C., London, Tel Aviv, and Tokyo. For more information, visit