Rosette Entity Extractor


Automatically find names of people, organizations, locations and more across many languages

Overview

Things not strings

Entities are the key actors in your text data: the organizations, people, locations, products, dates and more that are mentioned in your content. Using a synthesis of established text analysis best practices and machine learning statistical modeling, Rosette uncovers these entities, delivering structure, clarity, and insight to your data.

Real world applications

Entity extraction is the foundation for applications in eDiscovery, social media analysis, financial compliance and government intelligence. Rosette allows you to:

  • Resolve a person’s identity for government security and fraud detection
  • Track customer sentiment around products and companies
  • Analyze research for patent law, legal discovery, and compliance
  • Exploit valuable information from open source intelligence
  • Provide targeted search for content publishers and recommendation engines

Customizable to your unique needs

Our tools are unique among entity recognition software in their adaptability. In addition to supervised training, our field training kits enable you to run unsupervised training on your data to create personalized entity extraction models for your use case.

Customizable means training our entity extractor on a specific type of content, such as news articles, blogs, restaurant reviews, financial documents, medical records, legal contracts, patent filings, or short texts, such as tweets. It can also involve creating new entity types beyond our pre-built list, such as disease and drug names for a medical extractor, or job titles and skills for resume evaluation.

Product Highlights

  • 20 supported languages
  • 18 entity types detected
  • Intuitive cloud API
  • Customizable SDK
  • Fast and scalable
  • Industrial-strength support
  • Constantly stress-tested and improved

How It Works

A powerful hybrid solution

Our entity extraction is a hybrid solution: combining established natural language processing best practices such as advanced statistical modeling complemented by regular expressions pattern matching and entity lists. This combination gives the entity extractor the flexibility to detect entities missed by more simplistic solutions, improving accuracy and recall.

Statistical modeling

While regular expressions and lists are important aspects of entity extraction, machine learning and advanced linguistic processing functionality sets us apart. With statistical modeling, our users avoid four major problems inherent to simplistic extraction solutions:

  • Labor required to create a comprehensive list of all necessary entity types
  • Unknown entities will be missed by even the most exhaustive list
  • Lack of context consideration means that place names (Newton, MA) may be confused with people names (Isaac Newton)
  • Failing to extract misspelled list items

To meet these concerns, our entity extractor starts with a statistical model that is trained on millions of news and blog articles. As a result, it understands the context of both common entities you expect to find like people, organizations, products, and locations, as well as new entities you were unaware of.

Pattern matching

Rules expressed as regular expressions find entities which follow a pattern, such as dates, times, and email addresses. Many standard string patterns are pre-built into our entity extractor, and on-premise customers can easily customize their extraction workflow by editing or adding rules based on their specific needs.

Gazetteers and entity lists

We extract 18 common entities types including people, organizations, products, and locations (full list above). Unlike a home-brew or academic extractor, our gazetteers are regularly updated and stress-tested for enterprise level speed and performance.

Custom entity lists or gazetteers, available to on-premise customers, can be added when users know specific words or phrases that they expect to discover in their data. For example, a clothing manufacturer may add a list of basic colors they’d like to extract from tweets.

Tech Specs

Availability and Platform Support

Deployment Availability:
Plugins:
Bindings:

Supported Languages

Arabic French Japanese Portuguese
Chinese, Simplified German Korean Russian
Chinese, Traditional Hebrew Malay Spanish
Dutch Indonesian Pashto Urdu
English Italian Persian Vietnamese

Entity Types

Person Nationality Number Distance
Location Religion ID Number Date
Organization Money Phone Time
Product Credit Card E-Mail Lat/Long
Title URL

Try the Demo

Cloud API

Easy to Use API

Ideal for product evaluation, academic research, and smaller, cost-conscious businesses, our fast and powerful API is instantly accessible and free to get started. Our entity extraction endpoint is prebuilt to recognize and extract 18 entity types with coverage across 20 languages.

Try entity extraction and the rest of Rosette API’s endpoints, free up to 10,000 calls/month!

Get an API Key

Quality Documentation and Support

Customers love our thorough and responsive support team. We also provide in-depth documentation that lists all the features and functions of the various API endpoints along-side examples in the binding of your choice.

Visit our GitHub for the binding and documentation.

Enterprise Ready

Evaluate Rosette’s functional fit with your business and data needs on our cloud API knowing that scalable, customizable, on-premise deployments are available if you need them.

{
  "entities": [
    {
      "type": "PERSON",
      "mention": "Bill Murray",
      "normalized": "Bill Murray",
      "count": 1,
      "entityId": "Q29250",
      "confidence": 0.9990000128746033
    },
    {
      "type": "PRODUCT",
      "mention": "Ghostbusters",
      "normalized": "Ghostbusters",
      "count": 1,
      "entityId": "Q108745"
    },
    {
      "type": "TITLE",
      "mention": "Dr.",
      "normalized": "Dr.",
      "count": 1,
      "entityId": "T2",
      "confidence": 0.9990000128746033
    },
    {
      "type": "PERSON",
      "mention": "Peter Venkman",
      "normalized": "Peter Venkman",
      "count": 1,
      "entityId": "Q2483011",
      "confidence": 0.9990000128746033
    },
    {
      "type": "LOCATION",
      "mention": "Boston",
      "normalized": "Boston",
      "count": 1,
      "entityId": "Q100"
    },
    {
      "type": "IDENTIFIER:URL",
      "mention": "http://dlvr.it/BnsFfS",
      "normalized": "http://dlvr.it/BnsFfS",
      "count": 1,
      "entityId": "T5"
    }
  ]
}

On Premise

Customize and scale your entity extraction on premise

For organizations with vast data quantities, unique integration needs, and data security restrictions, we provide on-premise API deployment and SDKs to be hosted on your internal servers. Our field training kits enable you to run unsupervised training on your own data to create personalized entity extraction models for your use case, or create custom entity types beyond the 18 prebuilt entities.

Request a Free Product Evaluation

If your organization requires an on-premise solution, we’re happy to work with you to meet your business’ unique needs. For free evaluation of our on-premise deployments please complete the form below and our Customer Engineering team will provide you with an on-premise evaluation package.

Drop Us a Line

EMAIL:
info@basistech.com

PHONE:
+1-617-386-2000

Select Customers Include

No coding required

rapidminer-1

rapidminer

RapidMiner is the industry’s #1 predictive analytics platform. The client platform, RapidMiner Studio, empowers organizations to easily prep data, create models and operationalize predictive analytics within any business process.

Try RapidMiner