Our CJK language analyzers are used in some of the world’s most transaction-heavy environments, like Google’s search engine and Amazon’s e-commerce site. Rosette Base Linguistics for Chinese, Japanese and Korean are extremely accurate and reliable solutions to help complex applications process unstructured CJK language text by conquering some of these languages’ many challenges, such as the use of numerous scripts and absence of spaces between words. Using advanced morphological analysis, Rosette Base Linguistics performs functions critical for analyzing CJK text such as segmentation, lemmatization, noun decompounding, part-of-speech tagging, sentence boundary detection, and base noun phrase analysis.
Rosette Base Linguistics relies on dictionaries that are continually updated to keep pace with the continuing evolution of each language. For further detail on the dictionaries, please download a datasheet.
The Rosette Japanese Orthographic Analyzer (JOA), is a dictionary-driven software component that allows different orthographic forms of Japanese words to be normalized to a standard canonical form. This is similar to spelling variations in English, such as seen in foreign words and names (e.g. Osama and Usama). The dictionary used by JOA consists of thousands of variations observed in actual texts by lexicographers, since purely algorithmic approaches are prone to error. The current JOA data set is focused on general-purpose web search, and JOA is designed to help searches to find variations of Katakana orthographic notation as well as Kanji variations.
Also available is the Rosette Chinese Script Converter, for automatic conversion between Simplified and Traditional Chinese script. Chinese Script Converter solves the information retrieval issues stemming from the major differences between SC and TC, including character sets, encoding methods, orthography, vocabulary, and semantics. For example, “taxi” is written as “出租汽车” in Simplified Chinese and “計程車” in Traditional Chinese.
For more information about our Rosette Base Linguistics software, download the product datasheet, request a product evaluation, or browse our presentations about linguistic analysis and full-text search.