Cache-Oblivious Algorithms for Finite Element Matrix and N-body SimulationMichael A. Bender, Bradley C. Kuszmaul, Shanghua Teng & Kebin WangResultsIn this research, we have developed optimal cache-aware and cache-oblivious algorithms for matrix-vector multiplication of sparse matrices derived from well-shaped meshes. The key problem is to lay out the mesh to minimize cache misses. If n is the graph size, B is the cache block size, and M is the size of cache, and d is the dimensionality of the geometric domain of the mesh then an optimal layout allows the matrix-vector multiplication to run in O(n/B) memory transfers in the cache-oblivious model. This result requires the "tall cache assumption" M=Ω(Bd). Our cache-aware results are as follows: A simple divide-and-conquer layout is only within a factor of B1/d of optimal in the DAM model, but becomes optimal with the "taller cache" assumption M=Ω(Bd+1). Using the multi-way graph partitioning of Kiwi et al [2], an optimal cache-aware layout can be constructed with O((n/B) log (n/M)) memory transfers in the cache-oblivious model. To our knowledge, this is the first optimal cache-aware solution to the Graph-Neighborhood problem. Our cache-oblivious results are as follows: Using fully-balanced decomposition trees [3], originally developed for VLSI theory, an optimal layout can be constructed for the cache-oblivious model, O((n/B) log (n/M)) memory transfers in the cache-oblivious model. These results provide optimal cache-aware and cache-oblivious algorithms for the Conjugate Gradient method for finite-element computation, and provide good cache-oblivious algorithms for N-body simulation. References[1] M. A. Bender and B. C. Kuszmaul and S.-H. Teng and K. Wang. Cache-Oblivious Algorithms for Finite Element Matrix and N-Body Simulation. In preparation. [2] M. Kiwi, D. A. Spielman, and S.-H. Teng. Min-max-boundary domain decomposition. In Theoretical Computer Science for COCOON'98, volume 261, pages 253--266, 2001. [3] F. T. Leighton. A layout strategy for vlsi which is provably good (extended abstract). In Proceedings of the fourteenth annual ACM symposium on Theory of computing, pages 85--98, 1982. |
||
|