Instantly translate many names to (and from) English
Names are an essential source of information, but most names in the world are not written in English, rendering them nearly useless to Anglocentric corporations and governments. These organizations must quickly and accurately translate their names, often at a very large scale. Rosette® Name Translator (RNT) can quickly process millions of names from foreign languages and produce highly accurate, standardized English translations using industry- leading technologies, such as linguistic algorithms and statistical modeling. In addition, RNT can be used to translate any English name into its equivalent in any supported language, such as Arabic and Chinese.
As linguistics experts with deep understanding at the intersection of language and technology, Basis Technology continually improves the Rosette product family with language additions, feature updates, and the latest innovations from the academic world.
- Simple API
- High-scale and Throughput
- Industrial-strength Support
- Easy Installation
- Flexible and Customizable
- Integration: Java, C++, or Web Services
- Platform: Unix, Linux, Mac, PC (64 or 32-bit)
- Component of the Rosette SDK
A Difficult Problem
Translating names from other languages into English is quite difficult. Even the most powerful and expensive “machine translation” systems struggle when confronted with the task of accurately translating large numbers of names. Why is this so hard?
A Few Challenges:
- Which words in a name should be translated according to their spelling (i.e. transliterated) and which words according to their meaning?
- Within a language, there may be conflicting conventions for translation. Both “Fuji” and “Huzi” are accepted
- name spellings of the iconic Japanese volcano. Arguments over spelling the capital of Ukraine as “Kiev” vs. “Kyiv” have almost triggered diplomatic crises.
- Common practice may conflict with organizational standards. For example, the name of the former ruler of Iraq typically appears in the news media as “Saddam Hussein”. However, the CIA’s official
- spelling is “Saddam Husayn”. Similarly, the conventional spelling of the Syrian ruler is “Assad”. However, CIA guidelines say “Asad”.
- A name written in a foreign language may be native to that language, such as محمود أحمدي نجاد (Mahmoud Ahmadinejad), or may be an English name written in a foreign alphabet, such as جورج دبليو بوش (George W. Bush).
How it Works
RNT combines dictionary look-ups and transliteration to find the most accurate English spelling of a name. First, the foreign name is examined in user-supplied name dictionaries, known as gazetteers. If the name is not found, RNT transliterates the name into English by using linguistic algorithms and statistical modeling, then matches it using preferred name standards. For example, names written in Chinese are converted from ideographic characters into a phonetic representation. Names written in “unvocalized” Arabic (i.e. without short vowels) are automatically vocalized to enable a phonetic translation according to any of several user-selected standard systems.
- Generate “conventional spellings” of frequently appearing foreign names
- Process “unrecognized” names, i.e., those not appearing in any known catalog of foreign names
- Incorporate complex transliteration standards (such as the IC or U.S. Board on Geographic Names) for translating a name from a foreign alphabet into English
- Automatically resolve name spelling ambiguities in the source language, such as partial vocalization of Arabic, or word segmentation in Chinese