Presentation: Building a Global Listening Platform with Solr
Presentation: Language Support, Linguistics, and Text Analytics with Solr
Basis Technology’s Rosette provides enterprise-quality linguistic analysis to Apache Lucene and Solr. Leading organizations use us for deep linguistic processing and highly accurate search results in many languages. This linguistic plug-in delivers quality multilingual search results in over 40 Asian, European, and Middle Eastern languages.
Our commercially supported text analytics platform for search is used by top search engines including Google, Yahoo!, and Bing to segment Asian text, improve indexing through morphological analysis, and apply other language-specific features for better precision and greater recall in search results. With Rosette’s Apache Lucene and Solr connector, these benefits are achieved by enterprise customers who use these tools for search-based applications, enterprise search and other deployments.
Each of the world’s languages is unique, and search engines need to understand specific features of each language to deliver the best results. Rosette uses a combination of lexical data, heuristic rules, and statistical models to tokenize text, perform morphological analysis, extract entities, search for name variants, and more. We continually evaluate new approaches to linguistic analysis and update technologies or lexical data in our regular releases to enable our customers to focus on what they do best.
There are many sources of European, Asian, and Middle Eastern language support for search with Lucene/Solr, but implementing many languages may require several vendors and modules with different performance levels and features. Rosette gives search implementers high speed and accuracy for these languages via one API, so that plugging in one or 40 languages is easy and predictable. Basis Technology has been providing support for our customers around the world for over 15 years.
Our software has been extensively tested by major web and enterprise search providers, who adopted Rosette to provide quality search results in over 40 languages. Our technology has been tuned for high throughput and is highly scalable in the Lucene and Solr environment. Most importantly, our knowledgeable technical staff is available to support our customers— regardless of the native search language.
Rosette Base Linguistics, which tokenizes text, plugs into Lucene and Solr as a Tokenizer class, allowing quick and easy set-up of multilingual search. Rosette Language Identifier and Rosette Entity Extractor seamlessly integrate with Solr as UpdateProcessors. The language identifier is essential as Solr requires naming fields based on the language. Entities extracted enable faceted search results. Connection to Solr only requires a modification to the schema.xml and solrconfig.xml. Rosette also now supports LucidWorks Enterprise version 1.5, a new Solr-based search solution development platform from Lucid Imagination. Request an evaluation copy of Rosette today.
Rosette provides these linguistic advantages:
For more information about our language support for Apache Lucene and Solr, download the Rosette for Solr-Based Applications solution brief, download our whitepaper “Multilingual Search with Apache Lucene & Solr”, request a product evaluation, or browse our presentations about multilingual search.