Solutions
Home»Solutions»dtSearch

Supported Languages

  • Albanian
  • Arabic
  • Bulgarian
  • Catalan
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Finnish
  • French
  • German
  • Greek
  • Hebrew
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Malay
  • Norwegian
  • Pashto
  • Persian (Farsi / Dari)
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Urdu

Language Support for Multilingual Search in the dtSearch Engine

Language Support for dtSearch

Basis Technology provides enterprise-quality linguistic analysis to the dtSearch Text Retrieval Engine via Rosette. Leading organizations use us for deep linguistic processing and highly accurate search results in many languages. This linguistic plug-in delivers quality multilingual search results in over 20 Asian, European, and Middle Eastern languages.

dtSearch logo

Linguistics Chosen by Web Search Giants

Our commercially supported text analytics platform for search is used by top search engines including Google, Yahoo!, and Bing to segment Chinese, Japanese, and Korean text, improve indexing through morphological analysis, and apply other language-specific features for better precision and greater recall in search results. With Rosette’s dtSearch connector, enterprise customers can access these tools for search-based applications, enterprise search, and other deployments.

Support Many Languages in dtSearch with One Language Technology

Implementing Asian, European, and Middle Eastern languages can require several vendors and modules with different performance levels and features. Rosette gives dtSearch implementers high speed and accuracy for these languages via one source, so that plugging in one, or 24 languages, is easy and predictable. Basis Technology has been providing support for our customers around the world for over 15 years.

Accurately Index Documents by Language

At index and query time, the Language Identifier component of Rosette swiftly detects the encoding of documents, identifying 55 languages and 45 encodings. The algorithms are based on statistical profiles and trained on gigabytes of hand-verified data.

A Solution for Every Language Challenge

Each of the world’s languages is unique, and search engines need to understand specific features of each language to deliver the best results. Rosette uses a combination of lexical data, heuristic rules, and statistical models to tokenize text, perform morphological analysis, extract entities, search for name variants, and more. We continually evaluate new approaches to linguistic analysis and update technologies or lexical data in our regular releases to enable our customers to focus on what they do best.

Dependable Speed and Accuracy

Our software has been extensively tested by major web and enterprise search providers, who adopted Rosette to provide quality search results in over 20 languages. Our technology has been tuned for high throughput and is highly scalable in the dtSearch environment. Most importantly, our knowledgeable technical staff can help you whether your problem is with searching in Japanese, Arabic, Russian, or any other language we support.

Evaluate and Deploy in Hours

Rosette comes with source code for a dtSearch-compatible language analyzer to seamlessly integrate Rosette functionality. Using the included sample build environment for Windows, the developer can start using the resulting DLL as soon as it is dropped into the appropriate dtSearch language analyzer directory. Source code for the language analyzer gives the developer maximum flexibility for customizing it to the needs of the application. Request a free evaluation copy of Rosette today.

Rosette’s language analyzer for dtSearch has full access to all the language identification and base linguistics functions of Rosette at index and query time:

  • Language identification in 55 languages and 45 encodings: for indexing documents in many languages

  • Accurate tokenization in languages without spaces—Chinese, Japanese, and Korean—for greater precision

  • Decompounding words into sub-components for languages that freely create compounds—such as German, Dutch, and Korean—to boost recall

  • Lemmatization for relevant query expansion to boost recall and precision

  • Part-of-speech tagging to improve precision and recall

  • Entity extraction to find entities, enabling faceted search results

For More Information

Error

Fill out the form below, and we’ll contact you about your Rosette for dtSearch questions.

* indicates a required field

Learn More

For more information about our language support for dtSearch, request a product evaluation.