Multilingual Search

Improve the speed and accuracy of your search application with advanced linguistic analysis.

Search Many Languages with High Accuracy

Every language, including English, presents unique and difficult challenges for search applications to deliver relevant and precise results. Rosette® Base Linguistics (RBL) enables enterprise applications to effectively search or process text in many languages by providing a complete set of linguistic services. RBL enriches the original text in its native language for best-of-class natural language processing, improving speed, and accuracy. As linguistics experts with deep understanding at the intersection of language and technology, Basis Technology continually improves the Rosette product family with language additions, feature updates, and the latest innovations from the academic world.


Solutions


logo-careerbuilder

Read our CareerBuilder Case Study

The CareerBuilder.com content is the very definition of Big Text: mountains of structured and unstructured text data (resumes and job listings) in many languages.


Used by the Big Players

amazon.com logogoogle logopinterest logocareerbuilder logo

To Solve The Hard Problems of Multilingual Search

  • Tokenization

    Identifying words, particularly in non-space delimited languages such as Chinese.

    tokenization neededtokenization complete

  • Lemmatization

    Determining the root form of words through dictionary definitions as opposed to stemming.

    Lemmatization and Decompounding Regraphicization-01
  • Decompounding

    Separating compound words (common in German and Korean) into their appropriate sub components that can then be indexed independently.

    Lemmatization and Decompounding Regraphicization-02

To Solve The Hard Problems of Multilingual Search

Tokenization

Identifying words, particularly in non-space delimited languages such as Chinese.

tokenization neededtokenization complete

Lemmatization

Determining the root form of words through dictionary definitions as opposed to stemming.

Lemmatization and Decompounding Regraphicization-01

Decompounding

Separating compound words (common in German and Korean) into their appropriate sub components that can then be indexed independently.

Lemmatization and Decompounding Regraphicization-02

Google selected Basis Technology to provide the Asian linguistic technology needed to create the ultimate Chinese, Japanese and Korean search engine. This marks a key milestone in establishing Google as the preferred search engine for Internet users worldwide.

Urs Hölzle

Fellow and Vice President, Google

RLI - Rosette Language Identifier

Features

  • Language Identification
  • Script Identification
  • Language Boundary Locator
  • Encoding Conversion

RBL - Rosette Base Linguistics

Features

  • Tokenization
  • Lemmatization
  • Parts of Speech Tagging
  • Decompounding
  • Noun Phrase Extraction
  • Sentence Detection

REX - Rosette Entity Extractor

Features

  • Automatic tagging of entities
  • 16 entity types
  • Disambiguation of similar names
  • Finds unique names

Basis Technology has actively supported the development of Lucene and Solr for the past eight years. It’s no surprise that many of the largest Solr deployments have adopted Rosette for multilingual enablement and metadata enrichment.

Yonik Seeley

Creator of Solr and co-founder of Heliosearch

Contact us for more information:

This is a unique website which will require a more modern browser to work! Please upgrade today!