About Us
Home»About Us»Events»Government Users Conference»2010»Presentations

For more information on Basis Technology’s Users Conference:

Miryon Pak +1 (617) 386-2090 conference@basistech.com

Government Users Conference 2010 banner

Presentations

Day One, June 8, 2010 — Plenary Talks

Morning Keynote Address: Making the Case for Human Language Technology

Dan Scott – Office of the Director of National Intelligence (ODNI)

Dan Scott took on the role of the Senior National Intelligence Service Director of the ODNI Foreign Language Program Office in September 2008 with thirty years experience in national security, intelligence, and foreign language matters.

Prior to assuming his current position, Mr. Scott was a Colonel with twenty-nine years in the U.S. Air Force, with experience in intelligence operations, targeting, and requirements planning for operations from Iraq to Japan to Latin America and elsewhere. Mr. Scott served as the Assistant Commandant, Defense Language Institute Foreign Language Center at the Presidio of Monterey, California, and led the expansion of the language training program to support language requirements for the war on terrorism.

He graduated with B.S in international affairs from the U.S. Air Force Academy (1979) and an M.A. in Russian and East European studies from George Washington University (1985). Mr. Scott also received additional education from programs at the University of Miami, in Spain, and in Venzuela.

Afternoon Keynote Address: Detention in U.S. History: Looking Backwards for Answers Going Forward

Ahmed Qureshi, President and Co-founder – Harbinger Technologies Group

Ahmed Qureshi is the President and co-founder of Harbinger Technologies Group, a homeland security training and consulting firm. At Harbinger, he helped develop entity resolution software solutions and oversaw the training of over 34,000 law enforcement and military personnel.

Prior to Harbinger he served as Vice President for Global Business Development at INVESTools, Inc. and as head of Middle East Operations for Papa Johns International. He is an Adjunct Professor at the United States Air Force Special Operations School (USAFSOS) where he teaches courses dealing with the Islamic World. He is a graduate of the Middle East Studies program at Brigham Young University, the MBA program at the Thunderbird School of Global Management, the Naval War College Command and Staff Program and was a Fulbright grantee to the University of Jordan.

He is a drilling reservist and a veteran of Operations Enduring Freedom and Iraqi Freedom. He currently is completing his doctoral dissertation at Kings College, University of London, writing on U.S. detention policy in counterterrorism and counterinsurgency operations.

Plenary: Rapid Information Triage: A Practical Approach

Steve Kearns, Rosette Product Manager – Basis Technology

Our intelligence community routinely collects more data than we can effectively analyze. This means that we must use our linguistic and analytic resources as efficiently as possible. This talk surveys common workflows and shows how products from Basis Technology can be used to rapidly identify relevant documents and save valuable analyst time. We’ll take you on a behind-the-scenes walk-through and demonstration of the Odyssey Information Navigator—an information retrieval application, which incorporates the full suite of text analytics available in the Rosette 7 platform.

Day One, June 8, 2010 — Session Talks

Enabling the National Harmony Database

Kristin Summers, Deputy CTO – Knowledge and Information Management Division, CACI Benson Margulies, CTO – Basis Technology

Harmony is a Department of Defense system deployed at the National Ground Intelligence Center (NGIC) that provides a community of users with access to its extensive collection of records. These records include foreign, military, and public documents, electronic media, and translations of these materials. Links and descriptions of the records are stored in the Harmony database. Harmony provides consistent and simple access to this widely heterogeneous data collection, through a combination of keyword search and specialized metadata searches. Basis Technology components enable key capabilities in this application, including multilingual full-text search on the text of the foreign language collected material in Arabic and Farsi, interactive keyword translation for cross-lingual searches from English into Arabic content, and named entity extraction and word-by-word lexicographic analysis on demand for Arabic text. This talk will describe these National Harmony capabilities and their use of the Basis Technology components that support them, with specific emphasis on examples and test cases in Arabic.

It’s All in the Palm of Your Hand: An Overview of Cell Phone Forensics

Heather Mahalik, Senior Forensic Specialist – Basis Technology

Smartphones, BlackBerrys and iPhones provide us with the ability to exchange pictures, messages, check email, surf the Web, and watch videos all in the palm of our hands. As our reliance on handheld technology increases, these devices are advancing, and are becoming used like second computers. The digital forensics analysts must know how to acquire, preserve, and effectively examine data seized from a smart handheld device.

A Gentle Introduction to Entity Extraction

Brandon Mensing, Software Engineer – Basis Technology

Wave a magic wand and instantly know what people, places and, organizations are mentioned in a huge stack of documents. That is the “magic” of entity extraction. While quite elegant at the surface, this technology is a truly complex topic. In this introductory talk, we’ll explain all the basics and our multi-faceted approach to entity extraction. Topics will include definitions, use cases, evaluation metrics, and its benefits.

Coral Reef: Cellular Network Analysis and Exploitation in the Field

Andrew Walker, Vice President of Engineering – Berico Technologies Benson Margulies, CTO – Basis Technology

Especially in the thick of a conflict, the warfighter needs access to the appropriate capabilities and tools to take advantage of data from cellular network analysis. Until now, these tools have only been available to the analyst in the lab. Coral Reef will be the first tool of its kind to visualize cellular forensics data at the SECRET level, enabling battalion elements and below to conduct network analysis of SECRET data in the field. Coral Reef provides a single point of ingestion for data, provides the user the ability to enrich the data, and the capability to query the uploaded data in multiple methods (free text, named entities, selector, geospatial, temporal, nodal). Integrated with Basis Technology’s Rosette Name Translator, Coral Reef can display translation of names, enhance search results through disambiguation of names, and automatically lift out names of people, places and organizations from text messages. See a live demonstration of Coral Reef with simulated data.

Multilingual Problem Resolution with Transliteration Assistant

Brian Roberson, Program Manager, Desktop Tools – Basis Technology Youssef Fayed, Software Analyst – Basis Technology

Basis Technology’s Transliteration Assistant enables HUMINT collectors, intelligence analysts, and report writers to quickly and accurately translate names from Arabic, Dari, or Pashto into English while complying with the Intelligence Community’s applicable transliteration standards. This talk will introduce the use of Transliteration Assistant through problem-solving scenarios with the instructors, such as (1) forensic processing of digital media to extract human language, personal names, and name translations; and (2) extracting personal names from a visitor request form and checking those names (which may be in multiple languages) against a “do not enter” list maintained by a visitor control system. Time will be reserved during the session to discuss solutions to specific challenges raised by the tutorial participants.

Bringing Human Language Technology to the Warfighter

Carl Hoffman, CEO - Basis Technology

Human Language Technology (HLT) has been intensively pursued by government and academia for more than fifty years. Advanced capabilities have been developed which enable the analysis of large numbers of documents spanning dozens of human languages. Yet, until recently, relatively little of this technology has reached the warfighter.

Today, HLT is breaking out of the lab and moving into the field, performing missions which previously required expert linguists and boosting the productivity of translators, interpreters, and intelligence analysts.

This presentation will include several real-world case studies of HLT in theater, focusing on such disciplines as document and media exploitation; analysis of human intelligence; multilingual document repositories; and identity resolution, in such languages as Arabic, Dari, Farsi, Pashto, and Urdu.

The Warfighter’s Intelligence Dashboard: USAIC Tube

Jim Nolan, VP, Products and Innovation – Decisive Analytics Bill Ray, VP Federal Sales – Basis Technology

Often the key piece of intelligence that triggers action is not a single document or email, but rather an emerging trend or pattern found only through constant monitoring of vast amounts of raw intelligence and open source data. Decisive Analytics’ USAIC Tube is the enterprise web-based portal that keeps its fingers on a million pulses of data for the warfighter. And, integrated with Basis Technology’s Rosette Entity Extractor (REX) and Name Indexer (RNI), it seamlessly integrates data in different languages as it characterizes relationship between people, places, and organizations; looks for activities of interest within and across individual reports; and identifies activities building or declining over time. Learn about the advantages of this tool for intelligence analysis when combined with the multilingual name matching and entity extraction technology of Basis Technology.

Tracking Criminals Turn-By-Turn: GPS Forensics

Ben LeMere, Digital Forensics Analyst – Basis Technology

Tracking down criminal elements today can literally mean following their footsteps, or at least their turn-by-turn via portable navigation devices associated with criminal acts. The U.S. alone, accounts for 50% of portable GPS devices sold or around 20 million in 2009. Not surprisingly, the law enforcement community is suddenly having to know how to examine GPS devices in a manner consistent with the best practices of handling digital evidence.

This presentation will provide an overview of GPS forensics and discuss acquisition, examination and analysis techniques as well as available commercial and freeware tools. It will focus mainly on the major manufacturers – Garmin, TomTom, and Magellan – and how operators in the field, forensics examiners in the lab and intelligence analysts can leverage this type of data to support investigations.

More than a Name: Evidence for Identity Resolution

David Murgatroyd, Director of Engineering – Basis Technology

In the worlds of intelligence analysis and law enforcement, the allowable margin of error between fingering a suspected terrorist and a case of mistaken identity is unforgivingly small. While a name is a starting point for identification, it is rarely sufficient for complete unambiguity Other personal attributes from biographic (e.g., nationality) to biometric (e.g., DNA); from permanent (e.g., date of birth) to temporary (e.g., employer), fill in the complete picture of an “identity.” This talk explores such evidences for identity and the means by which they can be combined to narrow a large list of candidates to a handful.

Sit! Down! Extract! Teaching New Tricks to Your Entity Extractor

Brandon Mensing, Software Engineer – Basis Technology

No matter how well-trained an entity extraction system may be, it will always perform best on the type of text it was trained on, which frequently is not your text. This tutorial will detail step-by-step how to customize the Rosette Entity Extractor (REX) to achieve the extraction results on the text you must process, and the features of REX to handle text from tables and databases.

We will cover writing new regular expressions to extract entities with regular patterns, creating gazetteer databases of entities, and configuring the redactor to return the desired entity when there is a possibility of more than one. This tutorial is targeted at engineers and developers.

You say “Jamāl”; he writes “Djamel”: Influences on Western Transliteration of Arabic Names

Zina Saadi, Computational Linguist – Basis Technology

The proliferation of transliteration styles for Arabic names into Western languages is well known, but what are the factors that shape how names are represented across the Arabic world? This talk will look at examples of names influenced by formal languages and spoken in the region as well as how these languages influence the orthography of the names in Latin alphabet.

The Names of Afghanistan: Understanding Pashto and Dari Names

Bushra Zawaydeh, Ph.D., Senior Computational Linguist – Basis Technology

This talk introduces naming practices in Afghanistan, following a primer on Pashto and Dari, the two major languages spoken in Afghanistan. We will explore the linguistic attributes of Pashto and Dari names such as their influence by Arabic names, spelling variations, and morphology.

The Modern-Day Chinese Puzzle: Automated Chinese Text Analysis

Joe Ho, Principal Software Engineer – Basis Technology

In Chinese, every ideograph can represent a word or concept. In automated text analysis, breaking Chinese ideographs into words is only the first step to mining text, extracting entities, and resolving names for the most widely spoken language in the world. This talk introduces the features and characteristics of the Chinese language as a prelude to automated analysis of Chinese texts. We will look at solutions to meet the difficulties of coping with the different character sets used in China, Taiwan and Hong Kong. We will also look at extracting named entities, searching Chinese text, and recognizing Chinese names expressed in other languages.

Language Identification: The First Step in Processing Intelligence

Nobuo Otsuka, Senior Software Engineer – Basis Technology

A prerequisite to processing multilingual documents for search, intelligence analysis, e-discovery or digital forensics is knowing the language and encoding system of each file. The Rosette Language Identifier (RLI) is one of the most widely used language identifiers used in the commercial or government space today. Based on a statistical model, RLI has performance advantages over dictionary-based language identifiers and does not require updating as new vocabulary enters a language. Unlike language identification that relies on character code range, RLI easily distinguishes languages which share the same writing system, such as Russian vs. Ukrainian, or Arabic vs. Persian.

This talk will present an overview of RLI and discuss the techniques it uses to automatically identify document language and encoding of documents. We will also discuss how to use all the data points from RLI to measure confidence levels of an identification result or to achieve more fine-grained results.

Day Two, June 9, 2010 — Plenary Talk

Plenary: Basis Technology Update: Rosette 7

Steve Kearns, Rosette Product Manager – Basis Technology

Basis Technology develops and supports software products which address the foreign language needs of the defense, intelligence, and law enforcement communities. Our software has been applied to such missions as document and media exploitation; document triage; watch list management; and geospatial fusion.

This presentation will describe recent developments and additions to Basis Technology’s product line, and examples of how our technology is being used to deliver more powerful analytic capabilities to the warfighter and intelligence analyst.

This talk will be followed by a Q&A panel session with the Basis Technology management team.

Day Two, June 9, 2010 — Session Talks

Multilingual Problem Resolution with Transliteration Assistant

Brian Roberson, Program Manager, Desktop Tools – Basis Technology Youssef Fayed, Software Analyst – Basis Technology

Basis Technology’s Transliteration Assistant enables HUMINT collectors, intelligence analysts, and report writers to quickly and accurately translate names from Arabic, Dari, or Pashto into English while complying with the Intelligence Community’s applicable transliteration standards. This talk will introduce the use of Transliteration Assistant through problem-solving scenarios with the instructors, such as (1) forensic processing of digital media to extract human language, personal names, and name translations; and (2) extracting personal names from a visitor request form and checking those names (which may be in multiple languages) against a “do not enter” list maintained by a visitor control system. Time will be reserved during the session to discuss solutions to specific challenges raised by the tutorial participants.

Cost-Effective Multilingual Search Using Rosette and Apache Solr

Frank Calderon, VP of Business Development – Lucid Imagination & Steve Cohen, EVP & COO – Basis Technology

Lucid Imagination is the first commercial entity exclusively dedicated to Apache Lucene/Solr open source technology for search. As an active participant in the enormous community using Lucene/Solr, Lucid Imagination offers certified distributions of Lucene and Solr, commercial -grade support, training, high-level consulting and value-added software extensions. The company’s web site serves as a knowledge portal for the Lucene community, with information and resources to help developers build and deploy Lucene-based solutions in a more efficient and cost- effective manner.

Afghanistan’s Language and Culture: A Challenge for Security (in 2 parts)

Steve Kearns, Rosette Product Manager – Basis Technology Zina Saadi, Computational Linguist – Basis Technology

The situation in Afghanistan is at the forefront of our national security initiatives. The increased turmoil has led many Afghans to migrate to neighboring countries while others are joining forces to help stabilize the country. These changes have had significant impact on the languages used within Afghanistan and the security implications of those languages. This talk for intelligence analysts explores the regional influences of Farsi and Urdu as well as the orthographic influences of Arabic and the importance of these languages for text mining and analysis. We will delve into linguistic details of these languages and explain how analyzing this data presents new challenges to intelligence gathering and show you the latest technology for text analysis of Afghan languages.

Thai, the Tiger of Text Analysis: An Introduction to Thai Text Processing

Rattima Nisitroj, Linguist – Basis Technology

In natural language processing (NLP), Arabic is known as a complex language, but the less-studied Thai poses even more intriguing challenges. Syllable boundaries are ambiguous since some vowels precede a consonant;, some are written above a consonant; and some are combinations of the two. This variation makes it difficult to decide where the syllable boundary is, and, consequently, what sound a character represents as pronunciation for a character varies depending on syllable position. Moreover, Thai has no explicit word boundary marker and makes productive use of compounds. Conference “proceedings” is literally a “book collect article about academic in meeting seminar.” As a result, many character strings cannot be segmented into words in a straightforward manner. We will discuss some previous NLP approaches to Thai word segmentation and also look at related issues in Romanization, transliteration, and search technologies.

How Can Transliteration Assistant Assist You More?

David Murgatroyd, Director of Engineering – Basis Technology

Basis Technology’s Transliteration Assistant enables language analysts of diverse skills to quickly and accurately standardize and translate names. It does what it can do to help the analyst do on what only she can do – make important translation decisions. How can it do even more to help? What inspiration can it take from other Computer Assisted Translation tools? Join the conversation and shape the future of Transliteration Assistant.

Decoding Arabic Chat

Bushra Zawaydeh, Ph. D., Senior Computational Linguist – Basis Technology

KiLLeH Mn O5OoYuH e93’eeR!! :-)

Cat walking on a keyboard, or Romanized Arabic chat?

While transliterated Arabic poses its own issues of multiple standards and inconsistent use, asking linguistic software to make sense of Arabic chat is another matter entirely. How are words, parts of words, and sentence boundaries detected? What about non-linguistic expressions using mixed case letters, dialectical differences, and emoticons?

This talk decodes the representation of Arabic sounds in the Romanized shorthand commonly used in chatrooms and blogs by presenting findings from field analyses of Egyptian, Gulf, Iraqi, and Levantine online dialects.

Linguistics 101: The Conceptual Base of Natural Language Processing

Zina Saadi, Computational Linguist – Basis Technology

If you are new to natural language processing (NLP) and text analytics, a good understanding of the characteristics of human languages and linguistic concepts is invaluable. The talk will include examples from the Germanic, Indo-European and Semitic languages to illustrate the important elements of textual analysis, including a general introduction of the philosophy and types of languages, the structure of words (morphology), the meaning of the words (semantics), noun and verb phrases (constituents), and the structure of sentences (syntax). We will wrap up with examples from natural language processing to show the role linguistics play in text analytics. This talk is targeted to audiences new to natural language processing and text analytics or seeking to “fill in the blanks” of their linguistic understanding.

Tell Tale: What a Hill Can Reveal About Writing

Thomas Milo, Arabic Specialist and President DecoType Consultant to Basis Technology

Like layers of sediment in the Rocky Mountains record the evolution of life, a tell—a hill formed by layers of former inhabitations, one built over another – tells tales of the former inhabitants. A slice through a tell reveals black layers where civilizations were invaded and burned to the ground by barbarians, who, while destroying cultures, unintentionally baked into permanence their clay tablet libraries. As a result, a tell becomes a record of the evolution of writing systems in the Middle East.

This talk will trace the history of writing systems and how it accidentally produced today’s alphabet, the very one that we take for granted as the starting point of all script technology. Our alphabet is in fact the outcome of thousands of years of cultural erosion, not to say utter devastation. Ironically, the future story of writing systems and their evolution is at greater risk of extinction today than ever before, as clay tablets are replaced by elusive and vulnerable data storage media, subject to physical degradation and the quick obsolescence of software and hardware readers.

“Ask the Experts” Roundtable Discussion

Basis Technology Staff

Do you have topics or questions that you wanted to have answered by this conference, but which are not being covered? Would you like to fill the gaps of your understanding in any areas?

Fill out an index card in your registration kit with the topics and questions you would like to discuss or have answered. Topics attracting the most interest will be assigned to a roundtable session on day 2.