|
Cache-Oblivious String B-Trees
Michael A. Bender, Martin Farach-Colton & Bradley
C. Kuszmaul
Cache-Oblivious String B-Trees
B-trees are the data structure of choice for maintaining searchable
data on disk. However, B-trees perform suboptimally
- when keys are long or of variable length,
- when keys are compressed, even when using front compression,
the standard B-tree compression scheme,
- for range queries, and
- with respect to memory effects such as disk prefetching.
This work develops a cache-oblivious string B-tree (COSBTREE)
data structure that is efficient in all these ways:
- The COSBTREE searches asymptotically optimally and inserts and deletes
nearly optimally.
- It maintains an index whose size is proportional to the front-compressed
size of the dictionary. Furthermore, unlike standard front-compressed
strings, keys can be decompressed in a memory-efficient manner.
- It performs range queries with no extra disk seeks; in contrast, B-trees
incur disk seeks when skipping from leaf block to leaf block.
- It utilizes all levels of a memory hierarchy efficiently and makes
good use of disk locality by using cache-oblivious layout strategies.
References:
[1] Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul.
Cache-Oblivious String-Btrees. In The 25th ACM SIGMOD-SIGACT-SIGART
Symposium on Prinpiples of Database Systems (PODS 2006), pp. 233-242,
Chicago, IL, June 2006.
|
|