MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

Recovery and High Availability in Updatable, Distributed Data Warehouses

Edmond Lau & Samuel Madden

Motivation

A traditional data warehouse consists of a large distributed database system that processes read-only analytical queries over historical data loaded from an operational database. Such warehouses often do not need traditional database recovery or concurrency control features because they are updated via bulk-load utilities rather than standard SQL INSERT/UPDATE commands. They do, however, require high availability and disaster recovery mechanisms so that they can provide always-on access.

Recent years have seen increasing interest in providing support for warehouse-like systems that support fine-granularity insertions of new data and even occasional updates of incorrect or missing historical data; these modifications need to be supported concurrently with traditional updates. Such systems are useful for providing flexible load support in traditional warehouse settings, for reducing the delay for real-time data visibility, and for supporting other specialized domains such as customer relationship management (CRM) and data mining where there is a large quantity of data that is frequently added to the database in addition to a substantial number of read-only analytical queries to generate reports and to mine relationships.

These "updatable warehouses" have the same requirements of high availability and disaster recovery as traditional warehouses but also require some form of concurrency control and recovery to ensure transactional semantics. One commonly used approach for concurrency is to implement snapshot isolation [1, 2], which allows read-only queries to avoid setting any locks by having them read a historical snapshot of the database starting from some point in recent history. The standard solution to crash recovery in database management systems (DBMSs) involves maintaining an on-disk log data structure to record transactional updates and using a log-based recovery algorithm such as ARIES [3] to restore the database to a consistent state after failures. Whereas snapshot isolation is well-suited for warehouse settings, log-based crash recovery for large datasets, on the other hand, can take days or weeks to complete and prove prohibitive. Log-based recovery is ultimately a local site recovery solution that fails to take advantage of the replication inherent in high availability distributed frameworks.

We are therefore investigating the performance of a new integrated approach to recoverability, high availability, and disaster recovery for updatable distributed warehouses. Our approach is loosely inspired by the C-Store system [4], a distributed and column-oriented DBMS architecture geared toward providing high ad-hoc query performance on terabytes of data and reasonable performance on updates. To separate the performance differences due to our innovative approach from those due to a column-oriented approach, we have conducted our evaluation on a row-oriented database system. Though our evaluation thus far has been applied to distributed databases in a local area network, our approach also works in wide area networks.

Approach Overview

The gist of our approach is that any highly available database system will use some form of replication to provide availability; we have developed a framework showing that it is possible to exploit those replicas to provide efficient crash recovery. Our framework uses a timestamped version-history representation of data combined with support for historical queries that enable users to time travel and inspect the database as of some time in the past. We can then accomplish our recovery goal by periodically ensuring that replicas are consistent up to some point in history via a simple checkpoint operation and then using standard database queries to copy any changes from that point forward into a replica during recovery.

Though conceptually simple, there are challenges to implementing this approach:

If done naively, recovery queries can be very slow. Our approach tries to ensure that relatively little work needs to be done during recovery and that relatively little state needs to be copied over the network.
To provide high availability, we require that the remaining replicas be able to process updates while recovery is occurring. The details of bringing a recovering node online during active updates while still preserving ACID (atomicity, consistency, isolation, and durability) semantics are quite subtle.

Despite these challenges, we believe the approach that we have developed to be a viable solution. The core benefits of our approach include:

no need for a complex log-based recovery protocol like ARIES. Our approach is much simpler to implement correctly, and most of the approach can be understood as a series of standard SQL statements.
more efficient commit processing and hence higher transaction throughput. Traditional log-based systems require sites force-write log records to disk at various stages of commit processing in order to ensure atomicity. We have devised an optimized version of the three-phase commit protocol [5] that, when integrated with our recovery approach, can eliminate the force-writes while still ensuring atomicity, thereby significantly reducing runtime overhead.
support for bulk loads and and symmetric "bulk drops" of data without significant engineering work. Any data warehouse requires functionality to efficiently load new data from an operational database. Recent years have also seen the rise of massive clickthrough warehouses, such as Priceline, Yahoo, and Google, that must store upwards of one terabyte of information regarding user clicks on websites. These warehouse systems are only designed to store the most recent N days worth of click-through data. A time-partitioned segment architecture that we propose to support our recovery approach allows us to, for instance, create segments with time ranges of a day and to schedule a daily drop of the oldest segment.
support for a time travel feature that enables users to query the database as of some time in the recent past. Users may, for example, wish to save a version of the database before they insert a number of records so that they can compare the outcome of some report before and after a particular set of changes were made to the database.

Initial Evaluation

In order to evaluate the runtime overhead and recovery performance of our approach, we have implemented a distributed database system in Java and have instrumented it with our recovery algorithms as well as with the traditional two-phased commit [6] and ARIES protocols used in standard databases. Using our approach, our system is able to tolerate the fault of a worker and efficiently bring it back online without significantly impacting transaction throughput. Our experiments show that our optimized three-phase commit protocol can increase transaction throughput by a factor of 2 to 15 over traditional two-phase commit with ARIES on simple update-intensive workloads. Moreover, our recovery performance is comparable to ARIES and even surpasses ARIES when the workload consists primarily of inserts and updates to recent data, which is the case in data warehouse environments. The results are encouraging and suggest that our approach to updatable data warehouses is quite tenable.

References:

[1] Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. A critique of ANSI SQL isolation levels. In The Proceedings of SIGMOD, pp. 1--10, ACM Press, 1995.

[2] S. Wu and B. Kemme. Postgres-R(SI): Combining replica control with concurrency control based on snapshot isolation. In The Proceedings of ICDE, pp. 422--433, San Jose, California, United States, IEEE Computer Society, 2005.

[3] C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, In The ACM Transactions on Database Systems, vol. 17, pp. 94--162, March 1992.

[4] Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. C-Store: A Column-oriented DBMS. In The Proceedings of VLDB, Trondheim, Norway, 2005.

[5] Dale Skeen. Nonblocking commit protocols. In The Proceedings of SIGMOD, pp. 133--142, Ann Arbor, Michigan, USA, 1981.

[6] C. Mohan and B. Lindsay and R. Obermarck. Transaction Management in the R* Distributed Database Management System. In The ACM Transactions on Database Systems, vol. 11, pp. 378--396, 1986.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu