Software for Sophisticated Linguistic Analysis of English
Basis Technology’s Rosette® Base Linguistics for English is a software development kit for high-performance linguistic analysis. The software is designed for integration into any application that needs to analyze large volumes of unstructured English text. Combining multiple technologies, Rosette provides accurate morphological analysis of English text to maximize search accuracy and comprehensiveness.
Text Analytics Software Chosen by Web Search Giants
Our commercially supported text analytics platform for search is used by top search engines including Google, Yahoo!, and Bing to improve indexing through morphological analysis, and apply other language-specific features for better precision and greater recall in search results. These benefits are achieved by enterprise customers who use these tools for search-based applications, including enterprise search engines and e-discovery applications.
Generates Thorough and Precise Search Results
A unique advantage of Rosette Base Linguistics is its lemmatization capability. Lemmatization generates the dictionary form, or “lemma,” of each word and then uses these forms to increase the number of search results, while still maintaining precision. For instance, a search for the verb “spoke” would also find documents with “speak” and “spoken.”
By contrast, many other search software use a method called “stemming,” but stems may generate irrelevant results. For example, in a search for “animals,” its stem “anim,” is shared by the unrelated words “animate” or “animosity.” Using lemmatization, our base linguistics expands results to include only relevant information and satisfies applications that require the highest level of search quality.
Part-of-Speech Analysis Enhances Search Accuracy and Comprehensiveness
Another advantageous feature of Rosette Base Linguistics is its part-of-speech analysis, which accurately categorizes words as nouns, proper nouns, verbs, adjectives, etc. The part-of-speech analysis function enhances the lemmatization capability by determining if a word like “spoke” is a verb—and thus a candidate to be lemmatized—or a noun. Rosette Base Linguistics also contains a feature called Base Noun Phrase Extraction that detects complete phrases, including the head noun and any associated modifiers. These features maximize the accuracy and comprehensiveness of results in searches of complex English texts.
Part-of-speech tags and noun phrases can also be by machine learning systems, named entity extraction, document clustering, text-to-speech, and other applications.
- Simple API
- High-scale and throughput
- Industrial-strength support
- Easy installation
- Flexible and customizable
- Java or C++
- Component of the Rosette SDK
- Customizable user dictionaries, Japanese orthographic normalization, and Chinese scripts
- Cloudera certified