Language Support for Tagalog, Malay, and Indonesian Added to Rosette by BasisTech

The September release of Rosette® version 1.23.0 debuts entity extraction for Tagalog, joining Malay and Indonesian, and adds significant functionality for all three. Tagalog is an official language for the Philippines (the other being English). Both Malay (Standard) and Indonesian are standardized versions of Malay spoken in Malaysia and Indonesia, respectively.

This version increases base linguistics support, including part-of-speech tagging and lemmatization, for these Southeast Asian languages.

On the back-end, neural networks power the part-of-speech tagging feature for the three languages. Other base linguistics functionality, such as lemmatization, use a combination of technologies, including dictionaries, morphological rules, and abbreviation lists. These languages share interesting and challenging morphology (word forms) that you can learn more about in Challenges of Southeast Asian Languages — Tagalog, Malay, and Indonesian — for Text Analytics.

About BasisTech

Data analytics and machine learning are critical to verifying identity, understanding customers, anticipating world events, and uncovering crime. BasisTech provides businesses and governments with advanced analytics and AI-powered solutions for deriving insights from multilingual text, connecting data silos, and discovering digital evidence. Our Rosette text analytics platform employs classical machine learning and deep neural nets to extract meaningful information from unstructured data. Autopsy, our digital forensics platform, and Cyber Triage, our incident response tool, serve the needs of law enforcement, national security, and legal technologists. KonaSearch delivers deep search across Salesforce and other data sources.

Company headquarters are in Somerville, Mass., with offices in Washington, D.C., London, Tel Aviv, and Tokyo. For more information, visit