CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Cache-Oblivious Algorithms for Finite Element Matrix and N-body Simulation

Michael A. Bender, Bradley C. Kuszmaul, Shanghua Teng & Kebin Wang

Results

In this research, we have developed optimal cache-aware and cache-oblivious algorithms for matrix-vector multiplication of sparse matrices derived from well-shaped meshes. The key problem is to lay out the mesh to minimize cache misses. If n is the graph size, B is the cache block size, and M is the size of cache, and d is the dimensionality of the geometric domain of the mesh then an optimal layout allows the matrix-vector multiplication to run in O(n/B) memory transfers in the cache-oblivious model. This result requires the "tall cache assumption" M=Ω(B^d).

Our cache-aware results are as follows: A simple divide-and-conquer layout is only within a factor of B^1/d of optimal in the DAM model, but becomes optimal with the "taller cache" assumption M=Ω(B^d+1). Using the multi-way graph partitioning of Kiwi et al [2], an optimal cache-aware layout can be constructed with O((n/B) log (n/M)) memory transfers in the cache-oblivious model. To our knowledge, this is the first optimal cache-aware solution to the Graph-Neighborhood problem.

Our cache-oblivious results are as follows: Using fully-balanced decomposition trees [3], originally developed for VLSI theory, an optimal layout can be constructed for the cache-oblivious model, O((n/B) log (n/M)) memory transfers in the cache-oblivious model.

These results provide optimal cache-aware and cache-oblivious algorithms for the Conjugate Gradient method for finite-element computation, and provide good cache-oblivious algorithms for N-body simulation.

References

[1] M. A. Bender and B. C. Kuszmaul and S.-H. Teng and K. Wang. Cache-Oblivious Algorithms for Finite Element Matrix and N-Body Simulation. In preparation.

[2] M. Kiwi, D. A. Spielman, and S.-H. Teng. Min-max-boundary domain decomposition. In Theoretical Computer Science for COCOON'98, volume 261, pages 253--266, 2001.

[3] F. T. Leighton. A layout strategy for vlsi which is provably good (extended abstract). In Proceedings of the fourteenth annual ACM symposium on Theory of computing, pages 85--98, 1982.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)