Modern enterprise is well-acquainted with the promise of big data to revolutionize our insights and decision making, although it is less well-known that up to 80% of big data is represented by Big Text. Big Text is large quantities of “unstructured” text chunks found in documents, web pages, and databases with all the hallmarks of big data: the three Vs (Volume, Velocity, and Variety). Big Text is also multilingual, covering many languages and scripts, in all of their complexities and challenges.
Rosette® is a suite of linguistic analysis components that integrate into applications to quickly add multilingual capability for mining unstructured data. Applications using Rosette include search and retrieval; business intelligence; e-discovery; digital forensics; and financial compliance. Rosette provides such capabilities as identifying the language of incoming text; providing a normalized representation in Unicode; locating names, places, and other key concepts from a body of unstructured text; and name matching and name translation for names in foreign languages and scripts.
Highlight was created by linguistics and text analytics experts at Basis Technology in order to simplify IC-compliant workflow and report generation, greatly reducing the number of name inconsistencies from translator and intelligence analyst reports. Highlight currently supports 5 languages: Arabic, Dari, Farsi, Mandarin, and Pashto.