-
Extracting
Text from Arabic PDF
This presentation paints a solution to the problem of extracting Arabic
text from PDFs through modifications of the open source software PDFBox (www.pdfbox.org). It starts by looking at the
basics of PDF structure, then looks at how Arabic is stored in PDF and how to
get it out using a custom-modified PDFBox.
Presentation by Brian Carrier at Basis Technology’s Government Users
Conference on June 9, 2009.
-
Drive Analysis in
a Flash
Presents a new media analysis and exploitation technique based on the
statistical sampling of drive sectors. Using this approach it is possible to
make highly accurate statements about the contents of a 1TB disk with less than
10 seconds of analysis, and with a margin of error of less than 1%. Making these
statements requires a number of new advances in recognition technology and fast
database lookups which will is reviewed.
Presentation by Simson Garfinkel at Basis Technology’s Government Users
Conference on June 9, 2009.
-
Digital
Forensics R&D Initiatives at Basis Technology
As criminal and counter-terror investigations cross national and language
boundaries, the challenges include not only finding the right documents and
evidence among terabytes of data spread across thousands of hard drives, but
also searching for keywords or names in different languages, and then
interpreting search results in languages unfamiliar to the investigator. This
presentation reviews Basis Technology’s digital forensics initiatives as it
connects to the broader text analytic and name matching solutions.
Presentation by Brian Carrier at Basis Technology’s Government Users
Conference in College Park, MD on May 20, 2008.
-
Multilingual Keyword Search Comes to Digital Forensics
Searching hard drives containing text in foreign language presents
technical complexities which most investigators are unaware of: multiple
encoding schemes, orthographic variations, spelling variations, and online
“chat” dialects. This presentation introduces the Odyssey Digital Forensics
system, which has been specifically designed to address these linguistic
issues.
Presentation by Brian Carrier at Basis Technology’s Government Users
Conference in Washington, D.C. on June 7, 2007.
-
Cross Drive Analysis:
A New Approach to Media Exploitation
This presentation describes correlation techniques for the analysis of
large volumes of digital data, and presents results from ten years of research
on real-world drives.
Presentation by Simson Garfinkel at Basis Technology’s Government Users
Conference in Washington, D.C. on June 7, 2007.
-
Crash Course in Digital
Forensics
This presentation provides an overview of key topics in digital
forensics, including the investigation process; analysis techniques and tools;
and some examples. It also provides information on new forensics products being
developed at Basis Technology and how linguistic analysis techniques will be
incorporated into these products.
Presentation by Brian Carrier at Basis Technology’s Government Users
Conference in Washington, D.C. on June 14, 2006.