![]()
One keyword matches all variations of a word.
More and more, digital investigators are finding hard disks containing foreign language text. Many analysts take it for granted that their search tools can be used to locate important keywords in all languages. Since popular forensics tools do not include linguistic processing modules, this is an incorrect and potentially dangerous assumption.
These tools may only be finding a small percentage of documents which contain the specific keywords.
Basis Technology’s Odyssey Digital Forensics™ Keyword Search ensures that one query can locate different linguistic forms of search terms in 16 different languages, including Middle Eastern (Arabic, Persian), East Asian (Chinese, Korean, and Japanese) and 12 European languages.
Thus, a Chinese document can be discovered whether it includes Simplified script used in Mainland China or Traditional script used in Taiwan. Arabic documents can be found with prefixes like “al-“ ignored on keywords, and European verbs can be matched in different conjugation patterns.
How it works
From a captured disk image, Odyssey analyzes the file system to extract and recover files and extract text from them. Since good search comes from good data, Odyssey uses the Rosette® Linguistics Platform to preprocess multilingual text with its text normalization functions (see sidebar).
Odyssey uses the normalized text to build a search index. Then analysts type in search terms through a graphical interface to search this linguistically enhanced index.
