Rosette® Base Linguistics for Arabic is a multi-platform, high-performance linguistic engine that facilitates the analysis of documents written in Arabic. Designed to plug into mainstream search engines and data mining products, it performs orthographic and lexical normalization of Arabic text.
Traditionally an oral language, Arabic is not well-suited for standard automatic analysis techniques that look at a language’s written form. Arabic words frequently incorporate grammatical elements indicating attributes such as verb aspect, object, conjugation, person, number, gender, and others. For example, articles such as “an” and “the” are not separate words as they are in languages like English but are actually attached to the words to which they refer (for example, “their houses” is written as a single token, بُيُوتُهُمْ). There is additional ambiguity in Arabic due to the inconsistent use or absence of vowels. Therefore Arabic text requires significant pre-processing before it can be accurately indexed, searched, or put through any other text manipulation.
Rosette Base Linguistics also supports Farsi (Persian) and Urdu langauges
For more information about our Rosette Base Linguistics software, download the product datasheet, request a product evaluation, or browse our presentations about linguistic analysis and full-text search.