Rosette Entity Extractor (REX)

Rosette: Big Text Analytics


Automatically find the names of people, places, and organizations in text across many languages.

Instantly tag named entities from large quantities of text

Big Text represents the vast majority of the world’s big data. Lying hidden within that text are names of people, locations, organizations, and products, frequently the most valuable information inside. Entity extraction greatly reduces the human labor of reading through text to find this information, which is doubly hard to find across multiple languages. Extracted entities—names, places, dates, and other words and phrases—establish the real meaning in the text.

 Rosette® Entity Extractor (REX) instantly scans through huge volumes of multilingual, unstructured text and tags key data. REX uses multiple approaches to achieve the most accurate results: advanced statistical modeling, customizable rules, and pre-defined lists.

Key entities, such as names of people, places, and organizations depend on statistical entity extraction, which is trained on news articles. By nature, statistically trained models are most accurate on the type of data it is trained on. REX is unique among entity extractors in that it is adaptable. The REX field training kit enables you to add your text data to your entity extraction model to increase REX’s accuracy in your text environment.

Text Analytics

KEY FEATURES

  • Simple API
  • High-scale and Throughput
  • Industrial-strength Support
  • Easy Installation
  • Flexible and Customizable
  • Integration: Java, C++, or Web Services
  • Platform: Unix, Linux, Mac, Windows
  • Component of the Rosette SDK

How It Works

Machine Learning

REX Machine Learning

Statistical modeling with advanced linguistics solves two major problems:

  1. Overlap in the names of people, places, and organizations causes ambiguity. Consider the common surname Smith, compared with the business name Smith & Co., and the town of Smithfield, RI.
  2. Unique and new names with seemingly infinite formats and spelling variations.

Because of these problems, entity extraction for people, organizations, and locations can only be solved with a statistical engine. This solution utilizes machine learning to analyze, annotate, and process millions of news and blog articles on the web to train what is—and isn’t—an entity, in a real-world, context-rich setting.

Lists

REX Lists

Entities can simply be matched against standard lists and user taxonomies. For example, weapon names are matched with a list-based extractor. A large collection of gazeteers are included; custom lists, such as a terror watch list, can be easily added.

Rules

REX Rules

Rules may be used to detect regular expressions or patterns such as dates, times, and email addresses. Many standard string patterns are included; customers can customize by editing or adding their own rules, based on their specific needs.

Predefined Entity Types

REX natively supports the following entity types. User-defined entities, such as SKU numbers, are also available.

  • Person
  • Location
  • Organization
  • Title
  • Nationality
  • Religion
  • Product
  • Credit Card Number
  • Geographic Coordinate
  • Money
  • Generic Number
  • Personal ID Number
  • Phone Number
  • Email Address/URL
  • Distance
  • Date
  • Time
  • 16

    Supported
    Languages

  • Dutch
  • English
  • French
  • German
  • Italian
  • Portuguese
  • Russian
  • Spanish
  • Arabic
  • Hebrew
  • Pashto
  • Persian
  • Urdu
  • Chinese, Simplified
  • Chinese, Traditional
  • Japanese
  • Korean
Code Base
C++
Web Services
Java
Microsoft .Net
Platform Support
Windows
Linux
Red Hat
Mac

REX in Action

diagram-REX-example_web-01

REX Demonstation Video

Select Customers


Contact us for more information about integrating REX
into your application.

Learn More

Request a Product Evaluation

Download the Rosette Entity Extractor Datasheet

Fill out this form for more information

Whitepaper: Entity Extraction Enables Discovery

Allows the searcher to find relevant information even when they don't know what they're looking for.

This is a unique website which will require a more modern browser to work! Please upgrade today!