Solutions
Home»Solutions»Apache Lucene & Solr

Whitepaper: Essential Elements of an Excellent Multilingual Search Engine

Download

Supported Languages

  • Albanian
  • Arabic
  • Bulgarian
  • Catalan
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Finnish
  • French
  • German
  • Greek
  • Hebrew
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Malay
  • Norwegian
  • Pashto
  • Persian (Farsi / Dari)
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Urdu

Rosette for Lucene & Solr

Language Support for Lucene/Solr Search

Basis Technology’s Rosette provides enterprise-quality linguistic analysis to Apache Lucene and Solr. Leading organizations use us for deep linguistic processing and highly accurate search results in many languages. This linguistic plug-in delivers quality multilingual search results in over 40 Asian, European, and Middle Eastern languages.

Linguistics Chosen by Web Search Giants

Our commercially supported text analytics platform for search is used by top search engines including Google, Yahoo!, and Bing to segment Asian text, improve indexing through morphological analysis, and apply other language-specific features for better precision and greater recall in search results. With Rosette’s Apache Lucene and Solr connector, these benefits are achieved by enterprise customers who use these tools for search-based applications, enterprise search and other deployments.

A Solution for Every Language Challenge

Each of the world’s languages is unique, and search engines need to understand specific features of each language to deliver the best results. Rosette uses a combination of lexical data, heuristic rules, and statistical models to tokenize text, perform morphological analysis, extract entities, search for name variants, and more. We continually evaluate new approaches to linguistic analysis and update technologies or lexical data in our regular releases to enable our customers to focus on what they do best.

Many Languages – One Language Technology

There are many sources of European, Asian, and Middle Eastern language support for search with Lucene/Solr, but implementing many languages may require several vendors and modules with different performance levels and features. Rosette gives search implementers high speed and accuracy for these languages via one API, so that plugging in one or 40 languages is easy and predictable. Basis Technology has been providing support for our customers around the world for over 15 years.

Rosette Solr Diagram

Dependable Speed and Accuracy

Our software has been extensively tested by major web and enterprise search providers, who adopted Rosette to provide quality search results in over 40 languages. Our technology has been tuned for high throughput and is highly scalable in the Lucene and Solr environment. Most importantly, our knowledgeable technical staff is available to support our customers— regardless of the native search language.

Evaluate and Deploy in Hours

Rosette Base Linguistics, which tokenizes text, plugs into Lucene and Solr as a Tokenizer class, allowing quick and easy set-up of multilingual search. Rosette Language Identifier and Rosette Entity Extractor seamlessly integrate with Solr as UpdateProcessors. The language identifier is essential as Solr requires naming fields based on the language. Entities extracted enable faceted search results. Connection to Solr only requires a modification to the schema.xml and solrconfig.xml. Rosette also now supports LucidWorks Enterprise version 1.5, a new Solr-based search solution development platform from LucidWorks. Request an evaluation copy of Rosette today.

Rosette provides these linguistic advantages:

  • Language identification in 55 languages and 45 encodings: for indexing documents in many languages
  • Accurate segmentation in languages without spaces—Chinese, Japanese, and Korean—for greater precision
  • Decompounding words into sub-components for languages that freely create compounds—such as German, Dutch, and Korean—to boost recall
  • Lemmatization for relevant query expansion to boost recall and precision
  • Part-of-speech tagging to improve precision and recall
  • Entity extraction finds entities to enable faceted search on key names and entities in search results
RBL Segmentation, POS Tagging, and BNP Extraction Sample

For More Information

Fill out the form below, and we’ll contact you about your Rosette for Apache Lucene and Solr questions.

* indicates a required field
 First Name: *
 
 Last Name: *
 
 Organization: *
 
 Email Address: *
 
 Phone:
 

Learn More

For more information about our language support for Apache Lucene and Solr, download the Rosette for Solr-Based Applications solution brief, download our whitepaper “Multilingual Search with Apache Lucene & Solr”, request a product evaluation, or browse our presentations about multilingual search.