Our Arabic chat alphabet translator software can be integrated into any application to convert words from Arabic chat alphabet to standard Arabic script. This functionality is a key step in monitoring Arabic social media.
For over two decades, web search giants—including Google, Yahoo!, and Bing—enterprise search vendors, and government agencies have turned to Basis Technology’s text analysis software to enable them to process and search text in the major languages of the world. Our products are trained on large data sets, which are refreshed and updated as we adopt new technologies for ever greater accuracy and broader capabilities.
Arabic chat, also called Arabizi, is widely used in social media (Twitter, blogs, chat) as an easy-to-type alternative to standard Arabic. However, until now, automated analysis of this writing has not been supported by commercial text analysis tools. Complicating analysis is that depending on the writer, the Arabic chat alphabet can vary widely, replacing Arabic characters with numbers or English characters that sound like or resemble Arabic characters.
The Arabizi translator function can be integrated into any software environment such as a Java class library or web service. It is designed for high performance and is highly scalable—capable of running in multiple threads or in multiple cores.
Use of a dialect can identify the country an Arabic speaker hails from. Arabic words may be pronounced differently or have vocabulary variations from region to region due to dialectal differences. Arabic chat words—often written phonetically—reflect those differences. Rosette can detect dialectal chat and infer what is the most likely country of origin of the writer in addition to converting dialectal chat to natively written Arabic.
Arabic is used in over 25 different countries, so handling dialectal variations is key to accurate translation of Arabic chat alphabet to standard Arabic text. Just the one word “conspiracy” in Arabic chat alphabet is typed differently by those in Egypt, Saudi Arabia, and Morocco.
The Rosette Chat Translator can:
Unlike machine translation systems which rely on conventional dictionaries, Rosette Chat Translator is powered by an algorithmic and statistical approach. The algorithm analyzes the morphological components of each word to pick likely translation candidates. The statistical model is trained on a database of 300 million Arabic words collected from thousands of different websites to help the algorithm rank candidates.
Combined with the full Rosette text analytics platform, the chat translator can pipeline the Arabic converted from Arabic chat alphabet into the language identifier, Arabic linguistic analysis component, and the entity extractor. The result is a robust platform for analyzing real-world Arabic text on the web whether written in Arabic chat alphabet or standard Arabic.
For more information about Rosette Chat Translator, download the product datasheet or request a product evaluation.