CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Cache-Oblivious String B-trees

Michael A. Bender (SUNY Stony Brook), Martin Farach-Colton (Rutgers) & Bradley C. Kuszmaul

Summary

Traditional B-trees are designed for uniform key size, and most B-trees implementations employ heuristics to deal with heterogeneous key size. The String B-tree of Ferragina and Grossi [2] is designed to handle variable length keys in the standard cache-aware model where the block transfer size in the memory hierarchy is known. We developed static and dynamic Cache-Oblivious String B-trees (COSB-trees). Our data structures also employ a provably good version of front compression, a heuristic version of which is commonly used in practical B-trees.

The static COSB-tree represents a set D of N strings, and supports member, predecessor, successor, prefix, and range queries. Determining whether k is in the data structure requires O(1+logB N+s(k)/B) block transfers, where s(k) is the length of key k. Finding the predecessor or successor of k requires O(1+logB N+s(k)/B+s(k')/B) transfers, where k' is the predecessor (resp. successor) of k. The prefix operation on k requires O(1+logB N+(s(k)+c(Q))/B) transfers, where c(Q) is the size, using front compression, of the result set Q. Performing a range query from k to k' requires O(1+logB N+ (s(k)+s(k')+c(Q))/B) transfers. The compressed strings can be decoded for an additional c(Q)/B transfers.

The dynamic COSB-tree also supports insertions and deletions. The member, predecessor, successor, prefix, and range prefix operations match the performance of the static tree. To insert or delete k requires O(1+logB N+log2N s(k)/B) memory transfers.

The dynamic COSB-tree augmented with a scanning structure achieves the same bounds, except that range queries are amortized, and insertion and deletion are accelerated to O(1+logB N+log2+εlog N s(k)/B) amortized memory transfers, for any ε>0. If range queries need to return only pointers to keys (rather than the bits of the keys), then the updates become O(1+logB N+s(k)/B) amortized memory transfers. All bounds are with high probability.

This paper appears in [1].

Research Support

This research was supported in part by the Singapore-MIT Alliance and by NSF Grants ACI-0324974, EIA-0112849, CNS-0305606 and CCR-0208670.

References:

[1] Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. Cache-Oblivious String B-trees. To appear in the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS06), Chicago, Illinois, Country, June 2006.

[2] P. Ferragina and R. Grossi. The String B-Tree: A new data structure for string search in external memory and its applications. Journal of the ACM, 46(2):236--280, Mar 1999.

 

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu