News

How-to: Index and Search Multilingual Documents in Hadoop – Cloudera Blog

06 March 2014

Learn how to use Cloudera Search along with RBL-JE to search and index documents in multiple languages.

Basis Technology’s Rosette Base Linguistics for Java (RBL-JE) provides a comprehensive multilingual text analytics platform for improving search precision and recall. RBL provides tokenization, lemmatization, POS tagging, and de-compounding for Asian, European, Nordic, and Middle Eastern languages, and has just been certified for use with Cloudera Search.

Cloudera Search brings full-text, interactive search, and scalable indexing to Apache Hadoop by marrying SolrCloud with HDFS and Apache HBase, and other projects in CDH. Because it’s integrated with CDH, Cloudera Search brings the same fault tolerance, scale, visibility, and flexibility of your other Hadoop workloads to search, and allows for a number of indexing, access control, and manageability options.

http://blog.cloudera.com/blog/2014/02/how-to-index-and-search-multilingual-documents-in-hadoop/

 

autopsy

世界で最も多く利用されている簡単操作のオープンソース・デジタルフォレンジックツール

Learn More
cyber-triage

実用的、自動的、エージェントレスなエンドポイントレスポンス

Learn More
rosette

Natural language understanding for enterprise applications

Do even more with Rosette

Relationship Extraction · Sentiment Analysis

Categorization

Learn More
autopsy

The premier open source platform for forensic investigators and tool developers

Learn More
cyber-triage

Practical, automated, agentless endpoint response

Learn More
konasearch

Salesforce search that works

Learn More