Text analytics to enhance e-discovery systems: for better outcomes, reduced human labor.
Rosette text analytics offers a variety of AI-powered NLP components that easily integrate into e-discovery platforms to enable deeper search of more data.
With our text analytics, your platform can triage data faster and automate the discovery process. It can learn from documents already in the case file to recognize and tag documents as responsive and unresponsive.
Fuzzy Name Searching
Names of people and organizations may be the single most important category of entities in e-discovery. They are also the weakest link in general search engines which cannot cope with the wide variety of variations such as typos, nicknames, and phonetic errors (e.g., Hawkenberry vs. Hockenbury, John vs. Jack). Names can’t be spell checked because “Cindy” is just as valid as “Cyndi” as a name.
Rosette simultaneously considers 13 types of variations with every name search in addition to cross-lingual name matching. Thus the spelling of a Japanese company name in English can be matched to its name written in the native scripts of Japanese.
Nippon Telegraph and Telephone Corporation ↔ 日本電信電話株式会社
Fielded search is indispensable for e-discovery. With key entities for each document populated into metadata fields, investigators can search and filter results with greater accuracy. Or, for a higher level of precision, human editors can see suggested entities to accept/reject or additionally tag missed entities.
By extracting email addresses from email bodies, investigators may uncover less obvious connections between different parties.
Cross-lingual semantic search
Keyword search is essential in discovery but limiting. No person can imagine every possible relevant keyword on a topic, but add in semantic search (and across languages) and search becomes more human-like. Suppose you need documents that mention “flying drones.” With a little research, you might add “UAV (unmanned aerial vehicle)” as a keyword, but semantic search would also return specific models of UAVs and things that mean “flying drone” in other languages.
Semantic search adds fuzziness but maintains relevancy. It enables the searcher to say “find me more documents like this one” when an entire document is the search query, or “find documents that are relevant, even if they don’t contain the actual keywords.” Fuzzy search is like expanding circles around meaning, so another level out might include “autonomous robots.”
Transcripts of audio or video may be of poor quality, and again here, phonetic matching capabilities of Rosette’s name matching can help link the slightly garbled transcription of names to the correctly spelled name as it appears in other documents.
Rosette’s categorization functionality takes as training input a set of human-curated “responsive” and “unresponsive” documents to automatically learn the profile of each one. Then the categorizer can go swiftly through the pile of unreviewed documents to predict the ones that are most likely responsive for a human to verify.
For documents that the categorizer is unsure about (low confidence), a human expert can review and then tag them as responsive or unresponsive. Rosette will learn from those answers, and re-classify yet-unseen documents to better evaluate the documents and re-prioritize them for attorney review. Active learning enables the system to continually learn from new data and re-evaluate its tagging.
Having good document metadata is as crucial as good search. Through entity extraction, names of people, places, and organizations (even those misspelled) can be extracted into metadata fields which enables clear views of the data. Key information such as emails, URLs or other entities (e.g., chemical compounds, drugs) can be mapped relationally to people, organizations, or locations.
|Fw: Situation in China
|Fri, 13 Aug 1999 18:12:00 -0700 (PDT)
|People’s Republic of China
China National Petroleum Corporation
E & P
|< To: Ken Lay and Joe Sutton;
< First let me say that I heartily applaud the move you recently made in
< regard to oil and gas exploration and production. I feel somewhat
< vindicated in the position that I had previously taken with EOG management
< when I suggested selling off the domestic operations and concentrating on
< the foreign opportunities. My real satisfaction, however, is in the
< retention by Enron Corp. of both the China and India operations. I believe…
(Source of example above)
Assessment & Deduplication
Language detection exposes up front what languages are in the document repository, so that project managers can budget for the needed language experts. Even if some documents contain mixed languages (e.g., an email in English with a legal disclaimer footer in French), Rosette will be able to detect how much of each language is in a document.
Grouping together documents with duplicate content in English and across other languages reduces the number of documents that need to be reviewed. Rosette uses deep semantic technology that intelligently compares the semantic meaning of documents with one another to find duplicates.