Modern enterprise is well-acquainted with the promise of big data to revolutionize our insights and decision making, although it is less well-known that up to 80% of big data is represented by Big Text. Big Text is large quantities of “unstructured” text chunks found in documents, web pages, and databases with all the hallmarks of big data: the three Vs (Volume, Velocity, and Variety). Big Text is also multilingual, covering many languages and scripts, in all of their complexities and challenges.
Rosette® is a suite of linguistic analysis components that integrate into applications to quickly add multilingual capability for mining unstructured data. Applications using Rosette include search and retrieval; business intelligence; e-discovery; digital forensics; and financial compliance.Rosette provides such capabilities as identifying the language of incoming text; providing a normalized representation in Unicode; locating names, places, and other key concepts from a body of unstructured text; and name matching and name translation for names in foreign languages and scripts.
Highlight was created by linguistics and text analytics experts at Basis Technology in order to simplify IC-compliant workflow and report generation, greatly reducing the number of name inconsistencies from translator and intelligence analyst reports. Highlight currently supports 5 languages: Arabic, Dari, Farsi, Mandarin, and Pashto.
Odyssey is the launchpad to creating complex solutions for analyzing “Big Text”—difficult and messy data sources—such as financial records, electronic medical records, case law, social media and intelligence documents. Discovery and analysis is built upon extraction of named entities from unstructured, multilingual content, accurate name translation and full resolution of those entities against a knowledgebase.