Entity Extraction Enables “Discovery”
If you were looking for information on red cars, for example, even the technology that powers Ask.com would only return answers from sources explicitly including the word red. It would not return answers about burgundy or maroon unless those sources included the word red as well. That might pose a problem for, say, a car designer looking for the most popular color combinations mentioned in a year’s worth of customer emails.
Adding the word color to the search string as a workaround might improve results, but again, not all red-related content would include the word color either. Another issue arises in the fact that the word red has meanings outside the context of design, especially as a metaphor for danger, as in red zone. A designer might want to exclude those.
Bush is an even richer example of a word whose meaning is different depending on context.
A search on that word might return articles about President Bush as well as articles about landscaping. A journalist looking for documents about President Bush would want to specify the person attribute in the search.
What is missing in these scenarios is the ability to refine searches to target the desired meaning (semantics) of a term. The computer cannot automatically discover all the “entities,” i.e., concepts, expressed as words or phrases that have those attributes. With first generation search, you must already know all the right keywords to type into the search box. You must also have taken into account all the semantic misinterpretations (as in other kinds of bushes). And you would also need the skill to express terms so as to exclude those misinterpretations.
This ability to automatically discover semantic matches and exclude semantic non-matches is what entity extraction is all about.
Download The Whitepaper