Abstracts - 2007
Lower Bounds for Dynamic Data Structures
Erik D. Demaine & Mihai Pǎtraşcu
We have developed a new technique for proving lower bounds for dynamic data-structure problems in the powerful cell-probe model of computation. Our technique is the first which can prove a lower bound of Ω(lg n) in this general model, breaking a long-standing barrier of Ω(lg n / lg lg n). Using this approach, we proved lower bounds for the partial-sums problem and the dynamic connectivity problem. Our lower bounds are either known or conjectured to be tight.
When proving lower bounds for specific problems, a crucial point is what model of computation is assumed. Many lower bounds are proved in weak, structured models, and thus are not always relevant. For example, the progress made in the past decades in designing efficient data structures has broken most previous lower bounds for the comparison model or the pointer machine.
A very general model for analyzing data structures is the cell probe model. In this model, we assume that memory consists of b-bit words or cells, and only count the cell probes made by the algorithm (cell reads or writes). It is also assumed that b = Ω(lg n), where n is the size of the problem, so that natural quantities fit in a word. Since no restriction is made on the way data can be manipulated, this model is general enough, that lower bounds are relevant in any model of computation that is physically implementable today. In particular, the model is at least as strong as the pointer machine, the comparison model, or the more interesting Random Access Machine (RAM) model. In addition, by equating a ``cell'' with a ``page'' (and possibly having b >>lg n), the cell-probe model also captures the external memory and cache-oblivious models.
However, this great power leads to great challenges. Prior to our work, the only technique for proving lower bounds for dynamic data structures in the cell-probe model was the chronogram technique, introduced by Fredman and Saks in 1989 [FS89]. This technique is rather cumbersome to use, and only about three problems have been successfully analyzed using this approach (a few other bounds follow by reductions). In addition, the technique is inherently limited, and cannot prove a bound exceeding ω(lg n / lg lg n). Breaking this barrier has been considered one of the most important problems in data structures [Mil99].
Our technique begins by considering a random sequence of updates and queries (drawn according to a problem-specific distribution). Then, we consider a balanced tree (of appropriate branching factor) with the sequence of operations in the leaves—this can be visualized as a tree built over the time axis. A read instruction is determined by the time when it occurs, and the time when the location was last written. We associate every read instruction with a node in the lower-bound tree, corresponding to the longest common prefix between the update and read times. We then show that every node in the tree must have a large number of read instructions associated with it. Proofs of this fact involve elementary results from Kolmogorov complexity and clever encoding schemes. These encoding schemes are able to reconstruct all information from updates in a given subtree (which have high Kolmogorov complexity), based on the set of read instructions associated with a node. Thus we conclude that there must be many such instructions corresponding to that node.
Our approach is significantly more global that the chronogram method, and thus can prove stronger bounds. In addition, relying on clever encodings instead of complicated combinatorial arguments makes the proofs simpler.
Our main results in [PD06] are Ω(lg n) lower bounds for dynamic connectivity (answering connectivity queries in undirected graphs subject to updates) and for the dynamic partial-sums problem (sometimes known as the prefix-sums problem). We also obtained a number of additional results, such as lower bounds on the tradeoffs between update and query times, and Ω(lgB n) lower bounds for external-memory models. In nearly all cases, our bounds are either tight or conjectured to be tight.
By far the most appealing open problem is to break the Ω(lg n) barrier established by our results. Natural candidate problems are dynamic range queries in higher dimensions, and dynamic k-connectivity. Applying our technique to other natural problems would also be quite interesting, even if the bounds do not exceed Ω(lg n).
It would be nice to obtain an O(lg n) upper bound for dynamic connectivity, closing this problem; the current bounds are O(lg n ⋅ poly(lg lg n)). It would also be interesting to prove that our lower bound holds in some restricted cases, such as grid graphs or decremental connectivity. Finally, our technique cannot obtain satisfactory tradeoff lower bounds when the query time exceeds the update time. We believe that fixing this restriction would be very interesting, but would probably require significant additional insight.