Solutions
Home»Solutions»Search-Based Applications

Supported Platforms

Windows, Linux, Solaris, and MacOS

Supported Languages

  • Albanian
  • Arabic
  • Bulgarian
  • Catalan
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Finnish
  • French
  • German
  • Greek
  • Hebrew
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Malay
  • Norwegian
  • Pashto
  • Persian (Farsi / Dari)
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Urdu

Rosette for Search-Based Applications

Highly Accurate Text Analysis for Search in Asian, European, and Middle Eastern Languages

Full text search is ubiquitous. We access search engines daily on the Internet, in the office, on our home computers, and on portable devices. These products make it very easy to find information, but the technology they use internally is far from simple. Inside each search engine are sophisticated algorithms known as “computational linguistics”—software which analyzes digital text to enable it to be rapidly stored, searched, and retrieved.

Since 1998, the most widely used Internet and enterprise search engines have relied on Rosette® for essential natural language processing, including segmentation, lemmatization, decompounding, part-of-speech tagging, sentence boundary detection, and noun phrase extraction. With these capabilities as the foundation, our customers are setting the pace in their own markets.

“Google selected Basis Technology to provide the Asian linguistic technology needed to create the ultimate Chinese, Japanese and Korean search engine. This marks a key milestone in establishing Google as the preferred search engine for Internet users worldwide.” — Urs Hölzle, Fellow and Vice President, Google

The Rosette Solution

Rosette is designed to use a variety of different algorithms so the best approach can be applied for each language’s specific requirements. Depending on the language, a combination of lexical data, heuristic rules, and statistical models are implemented to provide the best accuracy and speed for all applications.

Rosette Segmentation, POS, and BNP Sample

Key Features

Rosette provides the most advanced capabilities commercially available, whether for searching within a language or across multiple languages. Base features include:

  • Language Identification automatically classifies documents and messages by language and encoding.
  • Segmentation/Tokenization determines the boundaries of the unique lexical tokens in input data, including locating punctuation, and other special characters.
  • Lemmatization generates the dictionary base form for an inflected form of a verb or adjective.
  • Noun Decompounding divides compound nouns into sub-compounds for accurate information retrieval.
  • Part-of-Speech Identification tags a word’s part-of-speech such as noun, verb, or preposition.

Enhanced Search Features

  • Sentence Boundary Detection – Marks boundaries of individual sentences.
  • Base Noun Phrase Analysis – identifies sets of words including a noun which describe a single expression.
  • Ignores user-defined stop words.
  • Supports customer-provided dictionaries to allow an application-specific vocabulary.
  • Language Boundary Locator – identifies multiple language regions within a single document soindividual languages can be processed and routed properly.
  • Chinese Script Converter – processes Chinese text and converts between Simplified and Traditional forms, handling both the character variations and the word-level differences.
  • Japanese Orthographic Normalizer – Normalizes different orthographic forms of Japanese words to a standard canonical form.

Rosette in Your Application

Rosette is a comprehensive linguistic platform ideal for any application which must process large volumes of multilingual text, including:

Rosette Components

Rosette is a single API that provides access to the various linguistic capabilities described above. Search solutions typically use the following Rosette components:

System Specifications

Rosette is a portable and highly scalable software developer kit (SDK) that runs on platforms ranging from laptop PCs to multi-CPU servers processing thousands of documents per second.

A fully-documented API is provided and may be accessed from applications written in C, C++, Java, and other languages. A command-line interface is also available for testing purposes.

SDKs are available for Apple MacOS, Microsoft Windows, Sun Solaris, and multiple Linux distributions.

For More Information

Fill out the form below, and we’ll contact you about your Rosette for search-based applications questions.

* indicates a required field
 First Name: *
 
 Last Name: *
 
 Organization: *
 
 Email Address: *
 
 Phone:
 

Learn More

For more information about our language support for search-based applications, download the Rosette for Solr-Based Applications solution brief, request a product evaluation, or browse our presentations about multilingual search.