CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Ubiquitous Memory Introspection

Qin Zhao, Rodric Rabbah, Saman Amarasinghe, Larry Rudolph & Weng-Fai Wong

Abstract

Modern memory systems play a critical role in the performance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and detailed modeling of the memory hierarchy, often using long address traces. It is increasingly possible to access hardware performance counters to count relevant events in the memory system, but the measurements are coarse-grained and better suited for performance summaries than providing instruction level feedback. The availability of a low cost, online, and accurate methodology for deriving fine-grained memory behavior profiles can prove extremely useful for runtime analysis and optimization of programs. This work introduces a new methodology for Ubiquitous Memory Introspection (UMI). It is an online and lightweight methodology that uses fast mini-simulations to analyze short memory access traces recorded from frequently executed code regions. The simulations provide profiling results at varying granularities, down to that of a single instruction or address. UMI naturally complements runtime optimizations and enables new opportunities for online memory specific optimizations. We have developed a prototype runtime system implementing UMI which is readily deployed on commodity processors, requires no user intervention, and can operate with stripped binaries and legacy software. The prototype has an average runtime overhead of 14 percent. This overhead is only 1 percent more than a state of the art binary instrumentation tool. We used 32 benchmarks, including the full suite of SPEC CPU2000 benchmarks, for evaluation. We show that the mini-simulations accurately reflect the cache performance of two existing memory systems, an Intel Pentium 4 and an AMD Athlon MP (K7). We also demonstrate that UMI predicts delinquent load instructions with an 88 percent rate of accuracy for applications with a relatively high number of cache misses, and 61 percent overall. The online profiling results are used at runtime to implement a simple software prefetching strategy that achieves an overall speedup of 64 percent in the best case. UMI can also be used in the context of an online software prefetcher. In many cases, our online software prefetcher can match the performance of the hardware prefetcher available in the Pentium~4, and in some cases, outperforms it. In the case of the AMD K7, UMI and software prefetching deliver an 11 percent overall performance gain. This kind of information is available only through exhaustive simulation.

References:

[1] Ubiquitous Memory Introspection. In Proceedings of the 2007 International Symposium on Code Generation and Optimization (CGO), San Jose, CA, March 2007.

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu