MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

Hybrid Quorum Byzantine Fault Tolerance

James A. Cowling, Daniel S. Myers & Barbara H. Liskov

Motivation

The increasing importance of highly available online services mandates protocols to ensure correctness and permanence of service data. Availability and correctness in the face of node failures and malicious attacks, so called Byzantine faults, introduces additional challenges to reliability. These challenges are addressed by Byzantine fault-tolerant replicated state machines, notably the Castro and Liskov BFT protocol [3].

Two approaches have traditionally been taken to providing Byzantine fault tolerant state machine replication: replica-based agreement approaches, such as BFT, and quorum based approaches such as the Query/Update (Q/U) protocol [4]. In an agreement protocol, a primary replica nominates an ordering for each client operation, with replicas communicating in three-phases of all-to-all communication to agree on the ordering. In the newer Byzantine quorum approaches, this work is pushed to the clients to optimistically order operations, proceeding with one or two phases of communication while avoiding inter-replica communication

Both agreement-based and quorum-based schemes have shortcomings. The quadratic message cost of BFT may saturate a network, with the three phases of communication introducing potentially excessive delay. Q/U was designed to address claimed scalability issues in BFT, and executes writes in one-to-two phases, with low communication overhead. On the other hand it requires a large number of replicas - 5f+1 to tolerate f faults, rather than 3f+1 in BFT. Q/U also suffers significant performance degradation when writes contend (concurrent writes on the same object), due to the inefficient use of randomized backoff.

Overview

In this work we leverage the advantages of both agreement and quorum schemes, in a hybrid protocol for Byzantine fault tolerance, HQ Replication. Under normal operation we run a lightweight two-phase quorum scheme, while employing the BFT protocol to resolve instances of write contention. Through this combination we offer the following properties:

3f+1 server replicas (the theoretical minimum for Byzantine fault tolerance)
No quadratic replica communication under non-contention operation
Low latency reads and writes
Support for general operations
Low per-message overhead
No public key cryptography in the normal case
Degradation to near-BFT levels of performance under highly contending write loads

Our quorum protocol allows clients to commit write operations in two phases. In the first phase a client asks for a grant from each replica, authorizing it to perform the operation at the given sequence number. If the client receives a quorum (2f+1) matching grants it can use the combination as a certificate to prove this agreement to the replicas in the second phase, committing its operation after a quorum of replies. To ensure liveness we provide writeback operations to enable clients to advance the state of slow replicas, or commit an operation on behalf of another client.

If different replicas have granted the same sequence number to different clients, there has been write contention, and the granted sequence numbers are in an inconsistent state. We then employ the services of BFT to resolve this contention via communication between replicas, committing all contending operations. Further details on the protocol are available in our paper [1] and technical report [2].

Performance

Our implementation of HQ Replication was found to exhibit good performance. Figure 1 shows an analytical comparison of the HQ and Q/U protocols, along with three different flavors of BFT - the original BFT protocol, along with those we modified to use MACs instead of authenticators (BFT-MACs), and incorporate the MAC optimizations along with the use of preferred quorums [4] (BFT-opt). Not only does HQ exhibit low communication load, but our modified versions of BFT significantly reduce communication cost over the original BFT implementaion.

Plot of Total Traffic vs F (tolerable failures) at replicas
Figure 1 : Total replica network load with 1000 requests/s and a message payload of 256 bytes.
Simulated load here is composed entirely of non-contending write requests.

Figure 2 illustrates the performance of HQ vs BFT (with BFT batch size of 1) under a real system deployment, in the absence of contention. HQ is found to outperform BFT in throughput, however significant gains were realized in BFT due to our optimizations. Figure 3 shows how HQ reacts to increasing levels of write contention, up to 100% of operations contending at contention factor 1.0. Performance degrades quite gradually in all cases, especially in comparison to existing quorum schemes.

Plot of System Throughput vs F (tolerable failures)
Figure 2 : Maximum system throughput of HQ compared to BFT.
Measured experimentally.

Plot of System Throughput vs Contention level
Figure 3 : Throughput degradation in HQ with increasing levels of write contention.
Measured experimentally.

An interesting side-note of this work is that with a few optimizations, BFT was found to perform much better than previously believed. This is further emphasized when request batching is employed, where the system buffers a number client requests and runs the protocol once per batch. This amortizes the cost of agreement over a number of client operations, reducing the cost of the agreement protocol. Batching is only possible in agreement-based protocols where there is a primary replica to buffer requests, and not in quorum protocols where replicas order operations independently. We hence recommend the HQ protocol for use where latency is critical or contention is relatively low, and BFT when contention is high or maximum system throughput is a primary concern.

Progress

Our base protocol for Hybrid Quorum Byzantine Fault Tolerance is fully developed as the HQ Replication protocol [1] [2]. We have implemented experimental versions of both HQ Replication and the BFT protocol on a new codebase for the purposes of performance comparison. We aim to release a final implementation of both these protocols in the near future, to facilitate use in building distributed systems, and provide consistent performance benchmarks for the research community.

We are also commencing the design of a large-scale distributed transactional storage system, that will employ Byzantine Fault Tolerance across the replicated servers at its core. This project will incorporate the new fault tolerance codebase.

Research Support

This research is supported by NSF ITR grant CNS-0428107 and by T-Party, a joint program between MIT and Quanta Computer Inc., Taiwan.

References:

[1] J. Cowling, D. Myers, B. Liskov, R. Rodrigues and L. Shrira. HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI '06), pp. 177-190, Seattle, WA, USA. November 2006.

[2] J. Cowling, D. Myers, B. Liskov, R. Rodrigues and L. Shrira. HQ Replication: Properties and Optimizations. MIT Technical Report MIT-CSAIL-TR-2007-009. Cambridge, MA, USA. February 2007.

[3] M. Castro and B. Liskov. Practical Byzantine Fault Tolerance. In The Proceedings of the 3rd Conference on Operating Systems Design and Implementation (OSDI '99), pp. 173--186, New Orleans, Louisiana, February 1999.

[4] M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter and J. J. Wylie. Fault-Scalable Byzantine Fault-Tolerant Services. In The Proceedings of the 20th ACM Symposium on Operating System Principles (SOSP '05), pp. 59--74, Brighton, United Kingdom, October 2005.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu