![]()
|
Research
Abstracts - 2007 |
|
Streaming Cache-Oblivious B-TreesMichael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan Fogel, Bradley C. Kuszmaul, Jelani Nelson & Chris WrightIntroduction The B-tree [3,12] is the classic external-memory
dictionary data structure, and it has held its preeminent position for
over three decades. Most B-tree implementations are, in fact, B An In fact, there is a tradeoff between the cost of searching and inserting
in external-memory dictionaries [8], and B-trees represent only one
point on this trade-off space. Another point in the trade-off space is
represented by the buffered-repository tree (BRT) [10].
The BRT supports the same operations as a B-tree, but searches use More generally, Brodal and Fagerberg's data structure [8], which
we call the B This work explores this insert/search tradeoff in the cache-oblivious
(CO) model [13]. The cache-oblivious model is similar
to the DAM model, except that unlike the DAM model, the block size There already exist several cache-oblivious dictionaries. The most well-studied
is the cache-oblivious B-tree [4,9,5]. The cache-oblivious
B-tree supports searches in
ResultsThe work introduces several cache-oblivious dictionaries that illustrate different points in the insertion/search tradeoff. Specifically, we present the following cache-oblivious data structures and results: Shuttle Tree The shuttle tree, our main result, retains the same asymptotic
search cost of the cache-oblivious B-tree while improving the insert cost.
Specifically, searches still take
This relatively complex expression for the cost of inserts can be understood
as follows: When the dominant term in the insertion cost is
Lookahead ArrayWe give another data structure that we call a lookahead array.
The lookahead array is reminiscent of static-to-dynamic transformations [7]
and fractional cascading [11]. This data structure is parameterized
by a growth factor. If the growth factor is chosen to be
For disk-based storage systems, range queries are likely to be faster for a lookahead array than for a BRT because the data is kept contiguously in arrays, taking advantage of inter-block locality, rather than scattered on blocks across disk. This is the same reason why the cache-oblivious B-tree can support range queries nearly an order of magnitude faster than a traditional B-tree; see e.g.,[6]. Deamortized Lookahead ArrayWe show how to deamortize the lookahead array and the cache-oblivious lookahead array. Thus, we obtain the first cache-oblivious alternative to the BRT . There is no amortization on searches and the worst-case cost for an insert is no more than the cost of a search. ExperimentsWe next show how efficiently the COLA performs. We implemented a COLA and compared it with a B-tree. For databases in external memory, the COLA was 90 times faster than the B-tree for random inserts, 2.5 times slower for sorted inserts, and 1.7 times slower for searches. We also found that design decisions as simple as merging elements largest first or smallest first had a greater impact on performance than, for example, the difference in cost of inserting in random or sorted order. We believe factors such as disk scheduling and prefetching are responsible. We examined several other interesting factors that had an impact on performance. Research SupportThis research was supported in part by NSF Grants CCF-0621511 and CCF-0541209, and a Google research award. References:[1] Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. In Communications of the ACM, 31(9):1116--1127, September 1988. [2] Lars Arge, Michael A. Bender, Erik D. Demaine, Bryan Holland-Minkley, and J. Ian Munro. Cache-oblivious priority queue and graph algorithm applications. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 268-276, 2002. [3] Rudolf Bayer and Edward M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173-189, February 1972. [4] M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. In Proc. 41st Annual Symp. on Foundations of Computer Science (FOCS), pages 399-409, Redondo Beach, California, 2000. [5] M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cache-oblivious dynamic dictionary. J. Algorithms, 3(2):115-136, 2004. [6] Michael A. Bender, Martin Farach-Colton, and Bradley Kuszmaul. Cache-oblivious string B-trees. In Proc. 25th Symposium on Principles of Database Systems (PODS), 2006. [7] Jon Louis Bentley and James B. Saxe. Decomposable searching problems i: Static-to-dynamic transformation. J. Algorithms, 1(4):301-358, 1980. [8] Gerth Stølting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546-554, 2003. [9] Gerth Stølting Brodal, Rolf Fagerberg, and Riko Jacob. Cache oblivious search trees via binary trees of small height. In Proc. 13th Annual Symposium on Discrete Algorithms (SODA), pages 39-48, San Francisco, California, January 2002. [10] Adam L. Buchsbaum, Michael Goldwasser, Suresh Venkatasubramanian, and Jeffery R. Westbrook. On external memory graph traversal. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 859-860, 2000. [11] Bernard Chazelle and Leonidas J. Guibas. Fractional cascading: I. a data structuring technique. Algorithmica, 1(2):133-162, 1986. [12] Douglas Comer. The ubiquitous B-tree. ACM Computing Surveys, 11(2):121-137, June 1979. [13] Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pages 285-297, New York, New York, October 1999. [14] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading, Massachusetts, 1973. |
![]() ![]() |
||
|