Tuesday, May 20: Technology Track | Language Track

Wednesday, May 21: Tutorials Track A | Tutorials Track B
Round Table Discussions


Program for Day 1 - May 20, 2008
(Go to Day 2 program - May 21, 2008)

KEYNOTE
Stephanie O'Sullivan, Director for Science and Technology, Central Intelligence Agency

Stephanie O'Sullivan was named Director for Science and Technology in August 2005. Since June 2003, Ms. O'Sullivan had been the Associate Deputy Director for Science and Technology. In that time, the DS&T focused on expanding technical and field support to HUMINT operations, delivering unique technical collection capabilities, building the CIA research cadre, and expanding the mission and application of open source intelligence (OSINT).

Ms. O'Sullivan will open the conference by drawing upon her 19 years of experience with technology R&D in the Intelligence Community.

PLENARY: New Developments from Basis Technology
Carl Hoffman, CEO
Benson Margulies, CTO, Basis Technology

Basis Technology develops software products which solve difficult problems in text analysis, content extraction, information retrieval, and identity resolution. Our products have been applied to a wide range of missions across the intelligence community in areas requiring human language technology (HLT), including DOMEX, CELLEX, HUMINT, SIGINT, and GEOINT.

This talk will survey the capabilities of Basis Technology's latest product releases and discuss our directions for future development.


AFTERNOON PLENARY: Everything You've Ever Needed To Do With Names
David Murgatroyd, Software Architect, Basis Technology

When searching documents or analyzing text, often the most critical pieces of information are the names of people, places, and organizations. But in a real-world environment, how can you be sure that one name is the same as another, especially if it's written in a different script or language? How can you be sure that you've found all occurrences of names on a watch list or in a database? How can you translate a name into a language you can recognize and process?

This talk will explore challenges of multilingual name resolution, retrieval, and translation. We will also demonstrate Basis Technology products which enable rapid identification of names in multiple languages and automatic, high-accuracy translation of those names into English.

Back to top of page

TRACK 1

Machine Translation and Digital Forensics: R&D Initiatives at Basis Technology
Scott Miller, Ph.D., Chief Scientist
Brian Carrier, Ph.D., Director of Digital Forensics,
Basis Technology

As criminal and counter-terror investigations cross national and language boundaries, the challenges include not only finding the right documents and evidence from among terabytes of data spread across thousands of hard drives, but also searching for keywords or names in different languages, and then interpreting search results in languages unfamiliar to the investigator.

R&D initiatives at Basis Technology are focussed on these very problems. Our Digital Forensics initiative addresses the first half of the problem, and our Machine Translation initiative addresses the second. This talk will review both initiatives and connect them with Basis Technology's broader text analytic and name matching solutions.


Actionable Intelligence from Semantic Analysis
David Ihrie, Semandex Networks
Steve Cohen, Executive Vice President, Basis Technology

The information structures to collect, share, and disseminate user relevant information among government agencies and NGOs fighting the war on terror is woefully outdated. Semantic networking, a capability that guarantees critical information is immediately disseminated to users that need it, has already provided a solution to this issue for national level information, and protoypes for this solution for small tactical groups.

This talk will describe an integrated capability known as "Tango" which helps collect, organize, and deliver semantically connected information.


Demystifying Entity Extraction Quality
Charlotte Shabarekh, Senior Computational Linguist, Basis Technology

Countless competitions and numerous organizations have attempted to define metrics which characterize the quality of entity extraction tools, but what do those scores mean when these tools are applied to the real world? Do those extractors which score the best in controlled competitions operating on clean data deliver the best results in a production environment operating on dirty data?

This tutorial surveys the types of measurements used for entity extraction quality, and discusses techniques to better extract the data you're looking for when general language models don't fit your needs.


Multilingual Search in Lucene
Marc Krellenstein, Lucid Imagination
Steve Cohen, Executive Vice President, Basis Technology

The combined strength of an enterprise-scale search engine based on the popular Lucene open-source search index core and Basis Technology's multilingual natural language processing products represent a new opportunity in enterprise search. This talk will discuss the ins and outs of the burgeoning Lucene marketplace, and how Basis Technology's Rosette Linguistics Platform puts multilingual search within easy grasp of Lucene users.


Guided Navigation in Arabic
Brian Frutchey, Federal Solutions Architect, Endeca
David Murgatroyd, Software Architect, Basis Technology

Endeca's unique Information Access Platform helps people find, analyze, and understand information in ways never before possible. This talk will describe how advanced Arabic linguistics—including name matching and name translation—have been incorporated into the Information Access Platform to create a powerful new analytic tool. This technology empowers intelligence analysts to expose connections and discover patterns in data which would otherwise be hidden from legacy search engines.


Back to top of page

TRACK 2

Real-World Issues of Name Translation
Robert Muir, Senior Systems Engineer, Abraxas Corporation

In a cross-script search environment, proper nouns written in their native script are not difficult for native speakers or even for computers. But what happens when your user base is unfamiliar with the target language? This talk presents lessons learned from a multi-billion document, cross-script search system in which the majority of users are familiar with only the Latin alphabet. Even with a perfect F-score (a measure of search relevancy), users may skip over relevant documents or misinterpret results if high-quality name translation is unavailable.

Talking points include language-specific challenges; the difficulty of "double-transliterated" names; the inverse relationship between name translation and name matching; and a brief overview of linguistic resources that are required to maximize the user experience.


Processing the Mosaic of Chinese Dialects
Benjamin Swanson, Software Engineer, Basis Technology

Although people commonly speak of "Chinese," the truth is that Mandarin is only one dialect among many mutually unintelligible spoken Chinese dialects, all of which share a common writing system. Yet even in their written form, Chinese dialects may use different words and characters to refer to the same ideas. Localized variants of Mandarin Chinese occupy a gray area between differences of accent and dialect.

These variants and dialects present a problem to statistical Natural Language Processing (NLP) algorithms due to the addition of new words, dissimilar semantics for the same word, and differences in pronunciation and grammar. This talk will explore the taxonomy of modern Chinese and illustrate the aforementioned difficulties through case studies of a dialect, Wu Chinese (spoken in the Shanghai area) and a Mandarin variant, Sichuanese (as spoken in Chengdu, the capital of Sichuan province).


Linguistic Considerations of Identity Resolution
David Murgatroyd, Software Architect, Basis Technology

Identity resolution systems indicate if two individuals really are the same person. Identity retrieval systems help you find the individual you're after. These systems appear anywhere from analysts' desks to border crossings. But how can you tell if a system is any good before it's deployed? You need to understand the problems it should tackle and how to measure how well it's doing.

This talk will consider metrics and data for evaluating identity resolution and retrieval systems. It will also explore the linguistic challenges these systems face.


The Next-Generation of Arabic Search: Linguistically Intelligent Retrieval
Zina Saadi, Computational Linguist & Middle East Languages Specialist, Basis Technology

The rapid growth of Arabic content on the Internet has increased the need for Arabic-savvy search. The latest generation of Arabic search techniques draws on advances in Natural Language Processing (NLP), taking search beyond simple string comparisons to a more intelligent search that can understand that kitaab ("book") is similar to kutub ("books") by analyzing the lemma of each word. This talk will demonstrate how a search engine with knowledge of the linguistic components of Arabic — the roots, lemmas and stems — can greatly boost the relevancy of search results.


A Linguistic Profile of the Persian Language and Dialects
Bushra Zawaydeh, Ph.D., Senior Linguist, Basis Technology

Persian is a complex language with many dialects—including Farsi, Dari, and Tajiki—spoken in many countries—including Iran, Afghanistan, and Tajikistan. Understanding Persian has become increasingly important in the fields of text mining and analysis.

This talk presents a brief history of the language, its speakers, and its dialects. We will compare Persian to other Arabic script languages such as Arabic, Pashto, and Urdu. We will then delve into linguistic aspects of the language, which are important to natural language processing and analysis applications such as, orthography, typography rules, phonology, and spelling variants.

Back to top of page

Program for Day 2 - May 21, 2008
(Go to Day 1 - May 20, 2008)

TUTORIALS TRACK A

Building Applications with Rosette Name Indexer (RNI) and Rosette Name Translator (RNT)
Benson Margulies, CTO, Basis Technology

Entity extraction has been widely deployed as a powerful technique for document triage and social network analysis. But what do you do if your documents are in a foreign language? Expensive "machine translation" systems frequently fail to produce output of acceptable quality and frequently fail to recognize names of key individuals, places, and organizations.

This tutorial will demonstrate how to rapidly construct an application which extracts names from foreign language documents, indexes those names, and automatically generates a high-quality translation into English according to the applicable agency transliteration standard. Real—world examples will be presented in Arabic, Chinese, Korean, Pashto, Persian, and Russian, for a total of six scripts and nine languages. This tutorial is appropriate for participants with a basic understanding of programming concepts.


Fingerprinting Hard Drives: Automated Document and Media Exploitation
Simson Garfinkel, Ph.D., Consulting Scientist, Basis Technology

Keyword search is a useful tool for identifying one critical document worthy of extra scrutiny from a collection of thousands. But what happens when keywords are unavailable or unknown? We will discuss a large-scale forensics system capable of ingesting a hard drive or flash memory device and answering abstract questions, such as "What makes this data different?" or "What about this drive is similar to other drives in our collection?"

This talk will discuss findings of research into automated Document and Media Exploitation (DOMEX) to develop tools which can automatically detect which hard drives and flash memory devices in a collection were previously used by members of terrorist networks.

Beyond the Search Bar: From Discovery to Active Intelligence
Sid Probstein, CTO, Attivio &
Basis Technology

With discovery, the search bar is the UI of last resort: it works if you know what you are looking for. But discovery is not just about exploring what you know; it's about uncovering what the content can tell you. It means mining content for patterns and allowing you to explore and navigate through them. If you can then use the results to launch a process, notify the right people or update a system, you have entered the world of active intelligence and driving the shift from finding information to using information. In this discussion we explore the latest discovery techniques and how you can exploit them for active intelligence. See a live demonstration of active intelligence technology integrating Attivio's AIE (Active Intelligence Engine) and Basis Technology's entity extraction and name translation.

Revealing Critical Links In Complex Data
Dan Haught, EVP, FMS Advanced Systems Group
John Saling, Director, Federal Sales, Basis Technology

While two entities may seem to have 23 degrees of separation to the human eye, FMS Advanced Systems Group's Sentinel Visualizer may reveal a much closer existing relationship through its analysis functions, including identifying central players and hidden patterns, finding the shortest path between two entities, and performing timeline or geospatial analysis. Learn about the advantages of this tool for intelligence analysis and law enforcement when combined with name translation, name standardization and entity extraction technology of Basis Technology.

Back to top of page

TUTORIALS TRACK B

Arabic Desktop Suite: Hands-On Tutorial
Part I: Transliteration Assistant & Knowledge Center

Youssef Fayed, Software Analyst,
Tina Lieu, Director of Knowledge Management, Basis Technology

This tutorial offers hands-on training with Basis Technology's Arabic Desktop Suite, an integrated collection of productivity-boosting applications designed for analysts, linguists, and translators.

Transliteration Assistant — a plug-in module for Microsoft Word, Excel, and Access which automatically standardizes names of people, places, and organizations into one of six formal transliteration systems, including the Congressionally-mandated IC transliteration standard for Arabic.

Knowledge Center — a single point of access for dictionaries, glossaries, gazetteers, name lists, and other reference materials which can be searched in English, Arabic, or transliterated Arabic. — a single point of access for dictionaries, glossaries, gazetteers, name lists, and other reference materials which can be searched in English, Arabic, or transliterated Arabic.

Tutorial participants will learn to prepare reports using standardized transliterations, to automatically translate lists of names, and to exploit online reference materials.


Arabic Desktop Suite: Hands-On Tutorial
Part II: Arabic Editor & GeoScope

Youssef Fayed, Software Analyst,
Tina Lieu, Director of Knowledge Management, Basis Technology

This tutorial offers hands-on training with Basis Technology's Arabic Desktop Suite, an integrated collection of productivity-boosting applications designed for analysts, linguists, and translators.

GeoScope — Access a library of high—resolution maps and pinpoint locations obtained from search queries in Arabic or via fuzzy matching of transliterated Arabic.

Arabic Editor — Rapidly compose, analyze, and edit Arabic documents using a standard Western keyboard with an input system which can be learned in less than one hour.

Participants in this hands-on tutorial will learn to access maps of the Middle East; to quickly identify locations on maps, and to type fully vocalized Arabic.

Back to top of page

ROUND TABLE DISCUSSIONS

See the list of possible round table discussions. Based on feedback from our registration forms, we will select and announce the round table topics a few weeks prior to our conference.