Identify the language of your documents with the Rosette Language Identifier
Doing business in Asia?
What's your plan for tokenizing languages that have irregular spacing between words?

Rosette Base Linguistics can solve your language processing problems in 40 different languages.
Identify parts of speech with Rosette Base Linguistics
Extract people and place names from documents using the Rosette Entity Extractor
Match names in multiple languages, with multiple spellings using the Rosette Name Indexer.
Documents are not databases. Rosette Base Linguistics can process raw text in 40 languages.
We mean BIG Text, Rosette is used by 7 out of 10 global software companies.
Experts say that 80% of Big Data is big unstructured text.
Translate names in 10 languages using the Rosette Name Translator
Connect your unstructured text to the real-world people, organizations and places you care about with the Rosette Entity Resolver

Gain insight and deep value from
unstructured text

Modern enterprise is well-acquainted with the promise of big data to revolutionize our insights and decision making, although it is less well-known that up to 80% of big data is represented by Big Text. Big Text is large quantities of “unstructured” text chunks found in documents, web pages, and databases with all the hallmarks of big data: the three Vs (Volume, Velocity, and Variety). Big Text is also multilingual, covering many languages and scripts, in all of their complexities and challenges.

Because of the intrinsic nature of unstructured text, standard enterprise data solutions have a very limited ability to understand and utilize this treasure trove of information.

Rosette® is a suite of software components for use in enterprise applications. It uses linguistic analysis, statistical modeling, and machine learning to accurately process Big Text, revealing valuable information and actionable data.

Individually, each component is a robust tool for processing language, documents, or names. When combined together, they create powerful solutions that deliver useful information for better decisions and deep value for their users. Our customers across the globe, in government, finance, e-discovery, search, social media, and beyond, depend on Rosette to analyze and transform their Big Text.



Google integrated Basis Technology’s linguistics software to enhance Google’s pan-Chinese search engine. Chinese search is important to Google’s global audience and our goal is to help these users find the information they’re looking for quickly and easily.

Susan Wojcicki

Google’s Director of Product Management


  • Simple API
  • High-scale and throughput
  • Industrial-strength support
  • Easy installation
  • Flexible and customizable
  • Java or C++
  • Unix, Linux, Mac, Windows
  • Support for Cloudera, Solr, & Elasticsearch

Select Customers

airbnb logologo-attivioautodeskLogo_DS_CMYK_WhiteBggoogle logoFujitsulinkedinoracle logoSymantec WebYelp Logo

The Problem

Big Text

  • Represents 80% of Big Data
  • Unstructured
  • Multilingual
  • Huge Volume

The Rosette Component Solution

Rosette Language IdentifierRosette Base LinguisticsRosette Entity ExtractorRosette Entity ResolverRosette Name IndexerRosette Name Translator

The Result

Sorted Languages, Better Search, Names of Entities, Matched Identities, Translated Names, Structured Text

Contact us to discuss how Rosette can help you solve your
search and information discovery challenges.

Learn More

Request a Product Evaluation

Download the Rosette Overview Datasheet

Fill out this form for more information

Startup Program

This is a unique website which will require a more modern browser to work! Please upgrade today!