Basis Technology Releases Search-Enhancing Entity Extractor
— Rosette Entity Extractor locates names, dates, places, noun phrases and other entities —
CAMBRIDGE, Mass., October 8, 2003 — Basis Technology (www.basistech.com) today introduced Rosette® Entity Extractor (REX), a software product which accurately locates and tags entities such as names, places, dates, and other words and phrases that establish the real meaning in a given body of text. Search engine and categorization performance is improved when the names of people, places, organizations and other entities have been correctly tagged and identified. REX is designed for integration into software systems for information retrieval, content/knowledge management, data warehousing, business intelligence, and other information-intensive applications. It uses advanced linguistics to help these systems classify, manage, analyze and mine large amounts of unstructured text coming from such sources as email, document files, and the Web.
The extraction of entities is critical to any application that must process, analyze, or categorize large volumes of text. REX helps prepare text for deeper analysis by identifying entities such as:
- Names - George Bush
- Places - The White House
- Organizations - the Republican Party
- Noun Phrases - President of the United States
- Dates - October 8, 2003
REX also tags an entity’s part of speech (such as noun, adverb, etc.) and detects sentence boundaries. REX is available immediately for English, German, and Japanese, with additional languages to be introduced later this year.
“The ability to apply advanced linguistics to the problem of finding and extracting meaningful concepts from large volumes of text is important to Convera's customers in both commercial and government sectors,” said Mushtaq Khan, Vice President of Product Management and Product Marketing for Convera. “We see these capabilities as a critical part of the next stage of information retrieval to enable more effective information discovery for functions like business analysis or threat detection. We are glad to see Basis Technology continue to advance their product offerings by delivering state of the art linguistics technology to the market place.”
Steve Cohen, Vice President of Product Development, Basis Technology, said, “REX was developed to meet the need for flexibility among companies who are dissatisfied with the rigidity of software currently on the market. In designing REX, we took into account developers’ need for linguistics software that is extremely flexible and trainable. By drawing on our years of experience in providing our Rosette software to the world’s foremost companies in unstructured data management and information retrieval, we have created an entity extraction system that leverages the most advanced linguistic techniques to create maximum value.”
The company also introduced several new additions to its family of Rosette Language Analyzers, offering linguistic analysis of European languages including English, French, Italian, German, and Spanish. The analyzers are based on linguistic, as opposed to purely statistical, algorithms and rely on code that is unique to each particular language, resulting in a more accurate analysis.
Sue Feldman, Research Vice President of Content Technologies at IDC, said, “As the volume of unstructured data-including data in many languages--continues to grow, so too does the demand for tools that will help organizations capitalize on this information. This market is growing rapidly. Starting at only $363 million in 2000, we expect it to reach over $600 million next year, exceeding the growth rate for software overall. The next big advance in information finding will require more than today’s reliance on statistical patterns. Instead, search, categorization and text mining software must incorporate the ability to understand the actual meaning of the words. Identifying the names of people, places and things improves the performance of any content access tool. We expect that the demand for Basis Technology’s new entity extraction tool will be strong.”
About Basis Technology
Basis Technology (www.basistech.com) provides software solutions for extracting meaningful intelligence from multilingual text. The company’s Rosette® Platform is a suite of high-performance, highly reliable, interoperable software components designed for applications that analyze and process all the world’s languages.
Top-tier software vendors, content providers, multinational enterprises, and government agencies rely on Basis Technology’s solutions for Unicode compliance, language identification, multilingual search, normalization, transliteration, and entity extraction. Clients include America Online, Convera, Endeca, Google, Hewlett-Packard, InQuira, Inktomi, L.L Bean, Northrop Grumman, PeopleSoft, Siebel Systems, Verity and Yahoo!.
Company headquarters are located in Cambridge, Massachusetts, with branch offices in San Francisco, California; Herndon, Virginia; and Tokyo, Japan. For more information, visit www.basistech.com or call 800-697-2062.
