Basis Technology’s Arabic Editor provides the language professional a powerful environment for composing, editing, and analyzing complex Arabic documents. It brings together a rich suite of analytical tools in a single framework, including automatic diacritization, automatic transliteration, online dictionaries, and syntactic analysis. It also provides the technology professional with a flexible tool for developing software applications which process Arabic text.
A unique feature of Arabic Editor is its system for entering and editing fully diacritized Arabic text from a standard PC keyboard, also known as the “QWERTY” layout. The input system is based on a transcription scheme used to approximate Arabic sounds in English. It is easily learned in less than one hour, yet provides productivity and accuracy gains of 2x to 4x that of a conventional Arabic keyboard layout.
For example, to type the Arabic name إبراهيم, the user enters “ibraahiim”. The following input box appears as the user types:
As the name is typed, the Latin spelling appears in the yellow box from left‑to‑right, while the Arabic spelling appears in the green box from right‑to‑left. The input box disappears upon completion of a word, but can be opened again to edit existing text. Text may be entered with all, some, or no diacritical marks.
Arabic Editor’s input system is based upon Basis Technology’s proprietary, fully-reversible transliteration system for Modern Written Arabic (MWA). This system provides a guaranteed “round trip” for any Arabic text (in Unicode) into and out of the Latin alphabet (in ISO 8859‑1 or ASCII). It is an intuitive and meaningful representation for Arabic speakers while also being easy to learn for non-speakers.
The following full sentence is presented both in Latin script form and Arabic script form:khaTaba al-shaykh ams qaa’ila-n inna al-qaahirä madiinä `aZiimä. خَطَبَ اَلشَّيْخ أَمْس قَائِلاً إِنَّ اَلْقَاهِرَة مَدِينَة عَظِيمَة.
Arabic Editor handles all the difficult tasks associated with hamza (ء) “seating,” by automatically choosing the correct “chair.” To enter a hamza, the user need only enter an apostrophe (’) and the correct orthography is automatically presented:
Unlike conventional Arabic keyboards, no special keystroke is required to type the lam-alif ligature. This is handled automatically:
Arabic Editor supports all of the major text encoding systems used in the Microsoft Windows environment, including Code Page 1256, ISO 8859‑6, Unicode UTF‑8, and Unicode UTF‑16.
A built-in Unicode text inspector is also provided. Invoking this inspector on the word اَلْكِتَاب yields the display shown at right.
Arabic Editor’s “Fuzzy Search” capability searches Arabic text using approximate Latin strings as input criteria. For example, search input such as “Hussein”, “Husein”, “Hussain”, or any of several similar variants of the Egyptian writer Taha Hussein’s surname will find the one correct Arabic spelling—حسين—within an Arabic text.
Arabic Editor contains built‑in support for six widely-used transliteration systems:
|Basis||Basis Technology||reversible, phonetic|
|BGN||U.S. Board on Geographic Names||partially reversible, phonetic|
|Buckwalter||Tim Buckwalter / QAMUS||reversible, non-phonetic|
|FBIS||Foreign Broadcast Information Service (now the DNI Open Source Center)||non-reversible, phonetic|
|IC||U.S. Intelligence Community||non-reversible, phonetic|
|SATTS||Standard Arabic Technical Transliteration System||reversible (consonants only), non-phonetic|
Invoking the transliterator on the following text:نَجِيب مَحْفُوظ
yields the following display:
Arabic Editor’s grammatical analyzer allows the user to parse Modern Written Arabic word‑by‑word. For example, grammatical analysis of the word كتب produces the following display:
The left column of the grammatical analysis results window shows possible vocalizations of كتب by adding the proper combinations of short vowels to the Arabic script. Below each vocalization, the Basis transliteration of the Arabic word is presented in green text. To the right, the corresponding translation and part‑of‑speech tag is displayed for each parsing.