Basis Technology Introduces Rosette Arabic Language Analyzer

—  First Commercially Available Analyzer for Arabic Developed Entirely in the United States  —

—  Addresses the Needs of US Government Agencies for Search and Retrieval of Arabic Documents  —

CAMBRIDGE, MA, March 4, 2003 — Basis Technology, the leading provider of globalization software and services, today introduced the Rosette® Arabic Language Analyzer (ARLA), the first commercially available analyzer for Arabic text developed entirely in the United States. ARLA is the latest addition to Basis Technology’s suite of Rosette Language Analyzers, which also includes products for Chinese, Japanese, and Korean. Developed in response to the needs of the US Intelligence Community, the new product is designed to plug into mainstream search engines and data mining products to facilitate search and retrieval of information written in Arabic.

“One of the most pressing issues facing the Intelligence Community today is the need to quickly and accurately identify, analyze, and extract information in foreign languages and scripts,” said Glenn Nordin, Assistant Director Intelligence Policy (Language), Department of Defense. “Because US Government computer systems are largely designed to work with the Latin alphabet and US character sets, processing information in Arabic is a difficult undertaking. In the absence of universal transliteration standards, human transcript of foreign text into the Latin alphabet can result in significant corruption of the data and mismatches in searches. Finding solutions that enable intelligence analysts to extract and disseminate information in the original language and script could be of critical importance.”

ARLA is a multi-platform, high-performance linguistic engine for analyzing Arabic documents. It performs orthographic and lexical normalization of text, including removal of grammatical affixes (such as conjunctions, prepositions, and pronouns) that complicate search and retrieval. ARLA utilizes advanced computational linguistics and specialized lexica to convert plural nouns, including broken plurals, to their singular forms.

The new product is a component of the Rosette Globalization Platform, a comprehensive software suite which enables multilingual information processing. Other components include the Rosette Core Library for Unicode (RCLU), a portable framework for implementing Unicode, and the Rosette Language Identifier (RLI), which automatically identifies the language and encoding of incoming documents. RLI now supports over forty written languages, including Arabic, Farsi, transliterated Arabic, and transliterated Farsi.

“Linguistics technology is beginning to play an increasingly important role when it comes to ensuring national security,” said Everette Jordan, Director the National Virtual Translation Center, an organization jointly sponsored by the FBI and CIA under the USA Patriot Act. “Because of the enormous volume of multilingual intelligence information that must be analyzed with limited human resources, technologies that can assist in sifting, sorting, and finding critical information are essential in ensuring that threats are detected as quickly as possible. Whereas the US Government cannot endorse any one product over another, we are pleased to see that companies are responding to the government’s call for solutions to these difficult issues.”

“Search and retrieval of information in Arabic documents is highly complex,” explains Glenn Adams, Technical Director Emeritus of the Unicode Consortium, and co-author of the Unicode Standard. “For example, Arabic incorporates affixes and infixes indicating grammatical elements such as conjugation, prepositions, and pronouns. Searching through documents for an exact match to a particular search term will miss many relevant hits. Searching for “book” (“kitaab”) will not return the Arabic term for “the books” (“alkutub”). ARLA solves this problem and many others like it, resulting in a more accurate and comprehensive search that doesn’t miss relevant terms because of slight grammatical variations.”

Together with the other language components of the Rosette Globalization Platform, ARLA enables federal law enforcement and intelligence agencies to expand their ability to detect and monitor intelligence originating in a foreign language, even when searching documents with terms which have been transcribed into the English alphabet.

“A key issue when searching Arabic text is the fact that names may be transcribed into English with many varied spellings, even though there will be far fewer ways of writing the same name in Arabic,” said Carl Hoffman, CEO of Basis Technology. “For example, there are over thirty different commonly-used English spellings for the name of Libya’s ruler, all of which correspond to the unique spelling of his name in Arabic. Our software can be used to build applications that allow users to search and retrieve information in Arabic documents using “phonetic approximation”—spelling the name the way it sounds—without having knowledge of the many varied transliteration schemes. This significantly increases the likelihood of non-Arabic speakers locating the critical information for which they are searching.”

ARLA is available for immediate shipment with plug-ins either available or under development for Convera RetrievalWare®, FAST Data SearchTM, Microsoft® SQL ServerTM, and Oracle® Text/interMedia.

About Basis Technology

Basis Technology (www.basistech.com) provides software solutions for extracting meaningful intelligence from multilingual text. The company’s Rosette® Platform is a suite of high-performance, highly reliable, interoperable software components designed for applications that analyze and process all the world’s languages.

Top-tier software vendors, content providers, multinational enterprises, and government agencies rely on Basis Technology’s solutions for Unicode compliance, language identification, multilingual search, normalization, transliteration, and entity extraction. Clients include America Online, Convera, Endeca, Google, Hewlett-Packard, InQuira, Inktomi, L.L Bean, Northrop Grumman, PeopleSoft, Siebel Systems, Verity and Yahoo!.

Company headquarters are located in Cambridge, Massachusetts, with branch offices in San Francisco, California; Herndon, Virginia; and Tokyo, Japan. For more information, visit www.basistech.com or call 800-697-2062.