Towards Constant Bandwidth Overhead Integrity Checking of Untrusted Data
Dwaine Clarke, G. Edward Suh, Blaise Gassend, Ajay Sudan, Marten van Dijk & Srinivas Devadas
We present an adaptive tree-trace scheme to improve the performance of checking the integrity of arbitrarily-large untrusted data, when using only a small fixed-sized trusted state. Currently, hash trees are used to check the data. In many systems that use hash trees, programs perform many data operations before performing a critical operation that exports a result outside of the program's execution environment. The adaptive tree-trace scheme we present uses this observation to harness the power of the constant runtime bandwidth overhead of a trace-based scheme. For all programs, the adaptive tree-trace scheme's bandwidth overhead is guaranteed to never be worse than a parameterizable worst case bound. Furthermore, for all programs, as the average number of times the program accesses data between critical operations increases, the adaptive tree-trace scheme's bandwidth overhead moves from a logarithmic to a constant bandwidth overhead.
This work studies the problem of checking the integrity of operations performed on an arbitrarily-large amount of untrusted data, when using only a small fixed-sized trusted state. Commonly, hash trees [1, 5] are used to check the integrity of the operations. The hash tree checks data each time it is accessed and has a logarithmic bandwidth overhead as an extra logarithmic number of hashes must be read each time the data is accessed.
One proposed use of a hash tree is in a single-chip secure processor [7, 8, 10], where it is used to check the integrity of external memory. A secure processor can be used to help license software programs, where it seeks to provide the programs with private, tamper-evident execution environments. In such an application, an adversary's job is to get the processor to unintentionally sign incorrect results or unintentionally reveal private instructions or private data in plaintext. Thus, assuming covert channels are protected by techniques such as memory obfuscation [6, 8], with regards to security, the critical instructions are the instructions that export plaintext outside of the program's execution environment, such as the instructions that sign certificates certifying program results and the instructions that export plaintext data to the user's display. It is common for programs to perform millions of instructions, and perform millions of memory accesses, before performing a critical instruction. As long as the sequence of memory operations is checked when the critical instruction is performed, it is not necessary to check each memory operation as it is performed and using a hash tree to check the memory may be causing unnecessary overhead.
In [2, 9], a new scheme, referred to as a trace-hash scheme, was introduced to securely check memory. Intuitively, the processor maintains a ``write trace'' and a ``read trace'' of its write and read operations to the external memory. At runtime, the processor updates the traces with a minimal constant-sized bandwidth overhead so that it can verify the integrity of a sequence of operations at a later time. To maintain the traces in a small fixed-sized trusted space in the processor, the processor uses incremental multiset hash functions  to update the traces. When the processor needs to check a sequence of its operations, it performs a separate integrity-check operation using the traces. The integrity-check operation is performed when the program performs a critical instruction: a critical instruction acts as a signal indicating when it is necessary to perform the integrity-check operation. (Theoretically, the hash tree checks each memory operation as it is performed. However, in a secure processor implementation, because the latency of verifying values from memory can be large, the processor ``speculatively'' uses instructions and data that have not yet been verified, performing the integrity verification in the background. Whenever a critical instruction occurs, the processor waits for all of the integrity verification to be completed before performing the critical instruction. Thus, the notion of a critical instruction that acts as signal indicating that a sequence of operations must be verified is already present in secure processor hash tree implementations.)
While the trace-hash scheme does not incur the logarithmic bandwidth overhead of the hash tree, its integrity-check operation needs to read all of the memory that was used since the beginning of the program's execution. When integrity-checks are infrequent, the number of memory operations performed by the program between checks is large and the amortized cost of the integrity-check operation is very small. The bandwidth overhead of the trace-hash scheme is mainly its constant-sized runtime bandwidth overhead, which is small. This leads the trace-hash scheme to perform very well and to significantly outperform the hash tree when integrity-checks are infrequent. However, when integrity checks are frequent, the program just uses a small subset of addresses that are protected by the trace-hash scheme between the checks. The amortized cost of the integrity-check operation is large. As a result, the performance of the trace-hash scheme is not good and is much worse than that of the hash tree. Thus, though the trace-hash scheme performs very well when checks are infrequent, it cannot be widely-used because its performance is poor when checks are frequent.
In this work, we introduce secure tree-trace integrity checking . This hybrid scheme of the hash tree and trace-hash schemes captures the best features of both schemes. The untrusted data is originally protected by the tree, and subsets of it can be optionally and dynamically moved from the tree to the trace-hash scheme. When the trace-hash scheme is used, only the addresses of the data that have been moved to the trace-hash scheme since the last trace-hash integrity check need to be read to perform the next trace-hash integrity check, instead of reading all of the addresses that the program used since the beginning of its execution. This optimizes the trace-hash scheme, facilitating much more frequent trace-hash integrity checks, making the trace-hash approach more widely-applicable.
The tree-trace scheme we present has three features. Firstly, the scheme adaptively chooses a tree-trace strategy for the program that indicates how the program should use the tree-trace scheme when the program is run. This allows programs to be run unmodified and still benefit from the tree-trace scheme's features. Secondly, even though the scheme is adaptive, it is able to provide a guarantee on its worst case performance such that, for all programs, the performance of the scheme is guaranteed to never be worse than a parameterizable worst case bound. The third feature is that, for all programs, as the average number of per data program operations (total number of program data operations/total number of data accessed) between critical operations increases, the performance of the tree-trace integrity checking moves from a logarithmic to a constant bandwidth overhead.
With regards to the second feature, the worst-case bound is a parameter to the adaptive tree-trace scheme. The bound is expressed relative to the bandwidth overhead of the hash tree, if the hash tree had been used to check the integrity of the data during the program's execution. For instance, if the bound is set at 10%, then, for all programs, the tree-trace bandwidth overhead is guaranteed to be less than 1.1 times the hash tree bandwidth overhead. This feature is important because it allows the adaptive tree-trace scheme to be turned on by default in applications. To provide the bound, we use the notion of a potential [4, Chapter 18] to determine when data should just be kept in the tree and to regulate the rate at which data is added to the trace-hash scheme. The adaptive tree-trace scheme is able to provide the bound even when no assumptions are made about the program's access patterns and even when the processor uses a cache, about which minimal assumptions are made (the cache only needs to have a deterministic cache replacement policy, such as the least recently used (LRU) policy).
With regards to the third feature, the adaptive tree-trace scheme is able to approach a constant bandwidth data integrity checking overhead because it can use the optimized trace-hash scheme to check sequences of data operations before a critical operation is performed. The longer the sequence, the more the data that the tree-trace scheme moves from the tree to the trace-hash scheme and the more the overhead approaches the constant-runtime overhead of the trace-hash scheme. As programs typically perform many data operations before performing a critical operation, there are large classes of programs that will be able to take advantage of this feature to improve their data integrity checking performance.
We have proven the security of the tree-trace scheme and have a software implementation of the scheme. Experiments have shown that the bandwidth overhead of integrity checking can be significantly reduced when the tree-trace scheme is used, compared to when a hash tree is used.
This research was supported by Acer Inc., Delta Electronics Inc., HP Corp., NTT Inc., Nokia Research Center, and Philips Research under the MIT Project Oxygen partnership.
 Dwaine Clarke, Srinivas Devadas, Marten van Dijk, Blaise Gassend & G. Edward Suh. Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking. In Advances in Cryptology - Asiacrypt 2003 Proceedings, volume 2894 of LNCS, pages 188-207. Springer-Verlag, November 2003.
 Dwaine Clarke, G. Edward Suh, Blaise Gassend, Ajay Sudan, Marten van Dijk & Srinivas Devadas. Towards Constant Bandwidth Overhead Integrity Checking of Untrusted Data. In Proceedings of the 2005 IEEE Symposium of Security and Privacy. May 2005.
 Blaise Gassend, G. Edward Suh, Dwaine Clarke, Marten van Dijk & Srinivas Devadas. Caches and Merkle Trees for Efficient Memory Integrity Verification. In Proceedings of 9th International Symposium on High Performance Computer Architecture, pages 295-306, February 2003.
 G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk & Srinivas Devadas. AEGIS: Architecture for Tamper-Evident and Tamper-Resistant Processing. In Proceedings of the 17th Int'l Conference on Supercomputing, pages 160-171, June 2003.
 G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk & Srinivas Devadas. Efficient Memory Integrity Verification and Encryption for Secure Processors. In Proceedings of the 36th Int'l Symposium on Microarchitecture, pages 339-350, December 2003.
 Jun Yang, Youtao Zhang & Lan Gao. Fast Secure Processor for Inhibiting Software Piracy and Tampering. In Proceedings of the 36th Int'l Symposium on Microarchitecture, pages 351-360, December 2003.