Automatically discover names of people , places , and organizations to build connections across millions of unstructured multilingual documents.

Rosette for eDiscovery

Expand Your eDiscovery Scope Beyond English

At its core, eDiscovery is about analyzing huge collections of unstructured content – documents, email, call logs, transcripts, contracts – to uncover information about people, places, and organizations.

In the age of globalization, this content may be written in different languages, using multiple scripts and character sets. The challenge is therefore how to efficiently search this multilingual text, extract entities with high accuracy and precision, and ensure that all the necessary information is revealed.

Basis Technology’s Rosette® suite of text analytics components provide a robust and scalable solution to this multilingual eDiscovery challenge. Through the combination of language identification, morphological analysis, entity extraction, and automatic name translation, Basis Technology can reveal the key information necessary to establish connections and build relationships.

“Text analytics is no longer an academic specialty. It has become a necessary component in most search and discovery software, from selling products, tracking terrorists, delivering news, or playing music  to improving communication among people worldwide. Basis Technology’s new Rosette platform ups the ante with its improvements in accuracy, enabling its customers to power a new breed of intelligent workspace applications.”

Susan Feldman, Research Vice President, IDC


Entity Search White Paper

Learn how entity search is revolutionizing the decision-making and problem solving process

Try a Product Evaluation

Request a complete set of the Rosette software platform today.

True Multilingual e-Discovery

Basis Technology helps the legal community meet its multilingual discovery challenges head-on with Rosette®, a linguistics platform proven in hundreds of commercial and government environments.

eDiscovery Solution Stream: Rosette Language Identifier to Rosette Base Linguistics to Rosette Entity Extractor to Rosette Name Translator

The Rosette software components are configured as building blocks, and work seamlessly within discovery workflows and information retrieval applications, covering the major European, Asian, and Middle Eastern languages. For legal professionals, Rosette provides the ability to examine multilingual text with unparalleled accuracy and efficiency.

Code Base
Web Services
Microsoft .Net
Platform Support
Red Hat

Step 1: Language Identifier

Identify the language(s) in a document


The Rosette Language Identifier (RLI) identifies the language(s) and character encoding systems present in a document so that its textual content can be filtered and processed. Extracted text is converted to Unicode so that discovery and information retrieval applications can access a single data representation regardless of language. Using a module called the Language Boundary Locator, mixed-language documents are segmented into regions so that language-specific processing can be performed on each region.

Step 3: Entity Extraction

Extract the items of interest (including those you didn’t know about)


The Rosette Entity Extractor (REX) sifts through unstructured text and identifies people, places, dates, and other items that establish the true meaning of a document for further analysis.

REX locates generic terms as well as custom entities such as specific names, phone numbers, and email addresses. Statistical modeling helps determine if an entity resides within a document, rather than simply referring to a list of possibilities and risk overlooking a variation. The result is entity extraction technology that lets you find what you know —and also what you didn’t know.

Step 2: Base Linguistics

Apply linguistic intelligence to identify word forms, parts of speech, and sentence structure


Rosette Base Linguistics (RBL) examines documents and performs a complete morphological analysis so that text can be accurately filtered, analyzed, and searched.

RBL identifies parts of speech, sentence boundaries, word breaks, tokens, lemmas and other linguistic components in European, Asian, and Middle Eastern languages.

Step 4: Name Translation

Automatically translate non-English names into English to enable rapid triage of multilingual content


Rosette Name Translator (RNT) uses a combination of user-supplied name dictionaries, linguistic algorithms and statistical modeling to provide highly accurate, standardized English translations of names that originate from several non-latin writing systems, including Chinese, Russian and Arabic.

By combining REX and RNT, key names can be extracted and translated to help investigators rapidly identify relevant documents that need to be flagged for translation and further study.

Contact us for more information:

Learn More

Download a Product Datasheet

Fill out the form on this page to get more information

This is a unique website which will require a more modern browser to work! Please upgrade today!