Abstracts - 2007
Automated Computer Forensics
Simson L. Garfinkel
Today's computer forensic tools provide advanced visualization and search capabilities for trained forensic investigators. These tools do not scale to the massive amount of data generated by ongoing intelligence operations, nor are they usable by ``mere mortal'' computer users.
Developing automated tools id difficult because of the massive amounts of data necessary to develop and validate algorithms, and the difficulty of keeping up with the ever-expanding variety of file types and formats.
To make progress, we are engaged in a two-pronged research project. The first prong is designed to ``prime the pump'' by creating forensic corpora that can be used by current researchers with few if any restrictions. The second prong will pursue targeted developments in forensic file formats, knowledge representation, inference techniques, and the presentation of forensic results.
Computer forensics research has been severely hobbled by the lack of realistic data on which to develop and validate new techniques. Real data that is used by forensic tools is highly sensitive by its very nature. Researchers need large quantities of email messages, word processing files, disk images, and network traffic to build and test their tools. But in many cases this data simply cannot be collected due to privacy concerns.
In the absence of such corpora, researchers have made due with data primarily generated by self-experimentation. Disk forensic tools are developed using a few file systems from the developer''s own computer system. Network forensic systems are based on packets monitored from the developer''s own Internet connection. Documents are based on the range of Microsoft Word and Adobe Acrobat files that can found with Google and freely downloaded over the Internet. Because these collections are not made available due to privacy concerns, researchers at different organizations must waste time and money amassing their own low-quality corpora. And because those corpora are not standardized, it is very difficult to compare published research.
We are investigating four approaches for creating data sets that sound, exploitable, interesting and deep:
Targeted Forensic Research Development
Using the hard drive corpus, we are developing a series of technologies and techniques to enable automated forensic processes. These include:
 Garfinkel, S., "Forensic Feature Extraction and Cross-Drive Analysis," The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006.
 Garfinkel, S., Malan, D., Dubec, K., Stevens, C, Pham, C., "Disk Imaging with the Advanced Forensics Format, Library and Tools," The Second Annual IFIP WG 11.9 International Conference on Digital Forensics, National Center for Forensic Science, Orlando, Florida, USA January 29 - February 1 2006.
 Garfinkel, S. and Shelat, A., "Remembrance of Data Passed: A Study of Disk Sanitization Practices," IEEE Security and Privacy, January/February 2003.