Full text search is ubiquitous. We access search engines daily on the Internet, in the office, on our home computers, and on portable devices. These products make it very easy to find information, but the technology they use internally is far from simple. Inside each search engine are sophisticated algorithms known as “computational linguistics”—software which analyzes digital text to enable it to be rapidly stored, searched, and retrieved.
Since 1998, the most widely used Internet and enterprise search engines have relied on Rosette® for essential natural language processing, including segmentation, lemmatization, decompounding, part-of-speech tagging, sentence boundary detection, and noun phrase extraction. With these capabilities as the foundation, our customers are setting the pace in their own markets.
“Google selected Basis Technology to provide the Asian linguistic technology needed to create the ultimate Chinese, Japanese and Korean search engine. This marks a key milestone in establishing Google as the preferred search engine for Internet users worldwide.”— Urs Hölzle, Fellow and Vice President, Google
Rosette is designed to use a variety of different algorithms so the best approach can be applied for each language’s specific requirements. Depending on the language, a combination of lexical data, heuristic rules, and statistical models are implemented to provide the best accuracy and speed for all applications.
Rosette provides the most advanced capabilities commercially available, whether for searching within a language or across multiple languages. Base features include:
Rosette is a comprehensive linguistic platform ideal for any application which must process large volumes of multilingual text, including:
Rosette is a single API that provides access to the various linguistic capabilities described above. Search solutions typically use the following Rosette components:
Rosette is a portable and highly scalable software developer kit (SDK) that runs on platforms ranging from laptop PCs to multi-CPU servers processing thousands of documents per second.
A fully-documented API is provided and may be accessed from applications written in C, C++, Java, and other languages. A command-line interface is also available for testing purposes.
SDKs are available for Apple MacOS, Microsoft Windows, Sun Solaris, and multiple Linux distributions.
For more information about our language support for search-based applications, download the Rosette for Solr-Based Applications solution brief, request a product evaluation, or browse our presentations about multilingual search.