How-to: Index and Search Multilingual Documents in Hadoop – Cloudera Blog

Learn how to use Cloudera Search along with RBL-JE to search and index documents in multiple languages.

Basis Technology’s Rosette Base Linguistics for Java (RBL-JE) provides a comprehensive multilingual text analytics platform for improving search precision and recall. RBL provides tokenization, lemmatization, POS tagging, and de-compounding for Asian, European, Nordic, and Middle Eastern languages, and has just been certified for use with Cloudera Search.

Cloudera Search brings full-text, interactive search, and scalable indexing to Apache Hadoop by marrying SolrCloud with HDFS and Apache HBase, and other projects in CDH. Because it’s integrated with CDH, Cloudera Search brings the same fault tolerance, scale, visibility, and flexibility of your other Hadoop workloads to search, and allows for a number of indexing, access control, and manageability options.