Language Support for Tagalog, Malay, and Indonesian Added to Rosette by BasisTech

The September release of Rosette® version 1.23.0 debuts entity extraction for Tagalog, joining Malay and Indonesian, and adds significant functionality for all three. Tagalog is an official language for the Philippines (the other being English). Both Malay (Standard) and Indonesian are standardized versions of Malay spoken in Malaysia and Indonesia, respectively.

This version increases base linguistics support, including part-of-speech tagging and lemmatization, for these Southeast Asian languages.

On the back-end, neural networks power the part-of-speech tagging feature for the three languages. Other base linguistics functionality, such as lemmatization, use a combination of technologies, including dictionaries, morphological rules, and abbreviation lists. These languages share interesting and challenging morphology (word forms) that you can learn more about in Challenges of Southeast Asian Languages — Tagalog, Malay, and Indonesian — for Text Analytics.

