About Us
Home»About Us»Resources»Data Quality and Unicode

Data Quality and Unicode

Data Quality

  • Exploiting GeoNames in Practical Applications  This presentation explains how NGA’s data is presently exploited by the Arabic Desktop Suite and future directions.

  • Designing Large-Scale Multilingual Systems  Foreign language documents pose challenges for the entire document-management pipeline: identifying the format, extracting text, indexing, search, retrieval, and display. While commonly used technologies work much better than they did a few years ago, there are still many ways to build systems that fail to handle foreign text. This presentation provides an overview of the problem and points out some of the more important issues and traps.

Unicode

  • Unicode 5.0 Essentials  This presentation begins with a look at how Unicode, established in 1991, has changed the way computers process text, with particular emphasis on Arabic, Chinese, Japanese, and Korean. For the non-programmer, this presentation briefly presents foundational concepts of encodings, characters, glyphs, code points, and the design principles behind Unicode.

  • Hewlett Packard Breaks the Printer Barrier of Global Operations  Basis Technology reviewed HP’s International Print Solution. Hewlett-Packard introduced technology to help companies overcome a key barrier to global operations — how to print documents correctly everywhere despite differences in language and script. Read our review.

  • Understanding Unicode 5.0  This presentation provides a gentle introduction to the basic concepts of the Unicode 5.0 standard, including characters, encodings, transcoding, byte ordering, and the common UTF 8 and UTF 16 transformation formats. Also covered is practical information about support for Unicode in popular operating systems, computer languages, and protocols.

  • Big Dots, Little Dots, and Circled Dots: How Unicode can help (and hurt) the process of converting documents to information