
Research
Abstracts  2006 
CacheOblivious Algorithms for Finite Element Matrix and Nbody SimulationMichael A. Bender (SUNY Stony Brook), Bradley C. Kuszmaul, Shanghua Teng (BU) & Kebin Wang (BU)ResultsIn this research, we have developed optimal cacheaware and cacheoblivious algorithms for matrixvector multiplication of sparse matrices derived from wellshaped meshes. The key problem is to lay out the mesh to minimize cache misses. If n is the graph size, B is the cache block size, and M is the size of cache, and d is the dimensionality of the geometric domain of the mesh then an optimal layout allows the matrixvector multiplication to run in O(n/B) memory transfers in the cacheoblivious model. This result requires the "tall cache assumption" M=Ω(B^{d}). Our cacheaware results are as follows: A simple divideandconquer layout is only within a factor of B^{1/d} of optimal in the DAM model, but becomes optimal with the "taller cache" assumption M=Ω(B^{d+1}). Using the multiway graph partitioning of Kiwi et al [2], an optimal cacheaware layout can be constructed with O((n/B) log (n/M)) memory transfers in the cacheoblivious model. To our knowledge, this is the first optimal cacheaware solution to the GraphNeighborhood problem. Our cacheoblivious results are as follows: Using fullybalanced decomposition trees [3], originally developed for VLSI theory, an optimal layout can be constructed for the cacheoblivious model, O((n/B) log (n/M)) memory transfers in the cacheoblivious model. These results provide optimal cacheaware and cacheoblivious algorithms for the Conjugate Gradient method for finiteelement computation, and provide good cacheoblivious algorithms for Nbody simulation. References:[1] M. A. Bender and B. C. Kuszmaul and S.H. Teng and K. Wang. CacheOblivious Algorithms for Finite Element Matrix and NBody Simulation. In preparation. [2] M. Kiwi, D. A. Spielman, and S.H. Teng. Minmaxboundary domain decomposition. In Theoretical Computer Science for COCOON'98, volume 261, pages 253266, 2001. [3] F. T. Leighton. A layout strategy for vlsi which is provably good (extended abstract). In Proceedings of the fourteenth annual ACM symposium on Theory of computing, pages 8598, 1982. 

