-
Afghanistan’s Language and
Culture: A Challenge for Security The situation in Afghanistan
is at the forefront of our national security initiatives. The increased turmoil
has led many Afghans to migrate to neighboring countries while others are
joining forces to help stabilize the country. These changes have had significant
impact on the languages used within Afghanistan and the security implications of
those languages. This talk for intelligence analysts explores the regional
influences of Farsi and Urdu as well as the orthographic influences of Arabic
and the importance of these languages for text mining and analysis. This
presentation delves into linguistic details of these languages and explains how
analyzing this data presents new challenges to intelligence gathering, and shows
you the latest technology for text analysis of Afghan languages.
Presentation by Steve Kearns and Zina Saadi at Basis
Technology’s Government Users Conference in Chantilly, VA on June 8-9,
2010
-
You
say “Jamāl”; he writes “Djamel”: Influences on Western Transliteration of
Arabic Names The proliferation of transliteration styles for
Arabic names into Western languages is well known, but what are the factors that
shape how names are represented across the Arabic world? This talk looks at
examples of names influenced by formal languages and spoken in the region as
well as how these languages influence the orthography of the names in Latin
alphabet.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference in Chantilly, VA on June 8-9, 2010
-
The Names of
Afghanistan: Understanding Pashto and Dari Names This talk
introduces naming practices in Afghanistan, following a primer on Pashto and
Dari, the two major languages spoken in Afghanistan. The talk explores the
linguistic attributes of Pashto and Dari names such as their influence by Arabic
names, spelling variations, and morphology.
Presentation by Bushra Zawaydeh at Basis Technology’s
Government Users Conference in Chantilly, VA on June 8-9, 2010
-
The World of
Arabic Nicknames In the Arab culture, the number of nicknames
for a person may seem endless. You often see them in chat, emails, or in oral
communication. Dealing with multiple nicknames is a tricky problem for fields
such as compliance, intelligence gathering and name resolution, since they could
be used as aliases. This presentation desribes different types of Arabic
nicknames and how they are used.
Presentation by Bushra Zawaydeh at Basis Technology’s
Government Users Conference on June 9, 2009.
-
Decoding Arabic
Chat KiLLeH Mn O5OoYuH e93’eeR!! :-)
Cat walking on a keyboard, or Romanized Arabic
chat?
While transliterated Arabic poses its own issues of multiple
standards and inconsistent use, asking linguistic software to make sense of
Arabic chat is another matter entirely. How are words, parts of words, and
sentence boundaries detected? What about non-linguistic expressions using mixed
case letters, dialectical differences, and emoticons?
This talk decodes the representation of Arabic sounds in the
Romanized shorthand commonly used in chatrooms and blogs by presenting findings
from field analyses of Egyptian, Gulf, Iraqi, and Levantine online
dialects.
Presentation by Bushra Zawaydeh at Basis Technology’s
Government Users Conference in Chantilly, VA on June 8-9, 2010
-
One Language,
Many Dialects: An Analysis of Arabic Dialects This presentation
discusses the similarities of many linguistic structures that define an Arabic
dialect as well as the differences that draw non-geographical boundaries, and
then show how this affects Arabic search.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference on June 9, 2009.
-
The Names of
Afghanistan – Understanding Pashto and Dari Names This
presentation introduces naming practices in Afghanistan, following a primer on
Pashto and Dari, the two major languages spoken in Afghanistan. It explores the
linguistic attributes of Pashto and Dari names such as their influence by Arabic
names, spelling variations, and morphology.
Presentation by Bushra Zawaydeh at Basis Technology’s
Government Users Conference on June 9, 2009.
-
You say
“Jamāl”; he writes “Djamel”: Influences on Western Transliteration of Arabic
Names This presentation reviews examples of names influenced by
formal languages and spoken in the region as well as how these languages
influence the orthography of the names in Latin alphabet.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference on June 8, 2009.
-
Next
Generation of Arabic Search: Linguistically Intelligent Retrieval
This presentation demonstrates how a search engine with knowledge of the
linguistic components of Arabic – the roots, lemmas and stems – can greatly
boost the relevancy of search results.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference in College Park, MD on May 20, 2008.
-
الأجيــال
القادمة لتقنيات البــحث العربي لقد أدى النمو السريع للمحتوى
العربي على شبكة الإنترنت إلى الحاجة إلى جيل جديد من البحث النصي ذو تقنيات متقدمة
لمعالجة تعقيدات اللغة العربية. هذا العرض يظهر كيف يمكن لمحرك البحث إستخدام
المكونات اللغوية للغة العربية -- الجذور، الجذوع، والكلمات المعجمية -- ليعزز بشكل
كبير ملاءمة لنتائج البحث .
-
A Linguistic Profile of the
Persian Language and Dialects This presentation is a brief
history of the Persian language, its speakers, and its dialects. It compares
Persian to other Arabic script languages such as Arabic, Pashto, and Urdu. It
also delves into linguistic aspects of the language, which are important to
natural language processing and analysis applications such as, orthography,
typography rules, phonology, and spelling variants.
Presentation by Bushra Zawadeh at Basis Technology’s
Government Users Conference in College Park, MD on May 20, 2008.
-
A Profile of Arabic
Script Languages This presentation explores the history of the
script in various Arabic script languages, the structure and characteristics of
the Arabic alphabet, the alphabet used, the phonological structure, the
borrowings, and the differences between Arabic and these languages.
Presentation by Bushra Zawadeh at Basis Technology’s
Government Users Conference in Washington D.C. on June 7, 2007.
-
Arabic, Farsi and
Urdu Text Normalization for Natural Language Processing This
presentation suggests a multi-level normalization for handling various Arabic
script orthographic variations that appear in current news corpora.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference in Washington D.C. on June 7, 2007.
-
Decoding Arabic
Chat This presentation decodes the representation of Arabic
sounds in the Romanized shorthand commonly used in chatrooms and blogs by
presenting findings from field analyses of Egyptian, Gulf, Iraqi, and Levantine
online dialects.
Presentation by Bushra Zawadeh at Basis Technology’s
Government Users Conference in Washington D.C. on June 7, 2007.
-
What’s in a Persian
Name? This presentation begins with the basics of Persian
phonology and name morphology, and delves into the rich influences of other
languages; cultural naming preferences (such as the decline of Arabic-based
names after the fall of the Shah in Iran); historical roots; and regional
customs.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference in Washington D.C. on June 7, 2007.
-
Orthographic Variations in Arabic Corpora This presentation
discusses the different kinds of Arabic orthographic issues that Basis
Technology’s Arabic linguists have encountered and handled while building
various software solutions for Arabic text analysis.
Presentation by Bushra Zawaydeh at Basis Technology’s
Government Users Conference in Washington, D.C. on June 14, 2006.
-
Behind the Name:
Etymology of Arabic Names This presentation gives some samples
of various linguistic rules that contributed to the evolution of certain famous
Arabic names. It samples different types of names as well as the influence of
various foreign languages; regional and social impacts; and language
evolution.
Presentation by Zina Saadi at Basis Technology’s Government
Users Conference in Washington, D.C. on June 14, 2006.
-
Tailoring UAX #29 Word Breaking
for Arabic Text
Presentation by Thomas Emerson at the 28th
Internationalization & Unicode Conference in Orlando, FL on Sept. 8,
2005.