Abstracts - 2006
Performance Analysis of the Castro and Liskov Byzantine Fault Tolerance Algorithm
Daniel Myers, James Cowling & Barbara Liskov
As the Internet becomes an increasingly-important part of everyday life, it is becoming critical that we be able to depend on Internet services. Constructing such dependable Internet services can be difficult, however, for at least two reasons. First, as services become larger and more complicated, the software engineering challenges increase to the point where it may be impossible to produce error-free code. Second, as services become increasingly important, they become more and more attractive targets for attackers. These two challenges suggest that future Internet services will require an architecture that is inherently resistant to failures caused by software errors or malicious attackers.
These types of failures, in which components of a system exhibit arbitrary behavior, are called Byzantine failures. In their 1999 paper , Miguel Castro and Barbara Liskov presented a Byzantine Fault Tolerance algorithm (BFT) that uses replication of a service over a group of 3f+1 servers in order to tolerance the Byzantine failure of f of them.
The performance of the Castro and Liskov BFT algorithm under high request rates and with large values of f (f>3) or on realistic Internet topologies has not been well characterized, however. Our present research aims to understand and improve the performance of the algorithm in this domain.
We are conducting theoretical and experimental analyses of the performance of the BFT algorithm. Using a local computing cluster, the Modelnet  network emulator, and a novel C++ implementation of the algorithm, we are examining the performance of BFT on both a local, high-speed network and on a realistic AS-level Internet topology.
Our theoretical analysis has produced a model of the network bandwidth consumed by the algorithm under a range of f values and request rates. Additionally, we have also identified opportunities to reduce bandwidth requirements by centralizing communication and by reducing the authentication information included in each message. Figure 1 shows an analytical model of the bandwidth consumed by BFT under five communication models and with a request rate of 1,000 256-byte messages per second. "Traditional BFT (MACs)" and "Traditional BFT (signatures)" show the bandwidth consumed using non-centralized communication and message authentication codes or digital signatures, respectively. The "Linearized BFT" lines show bandwidth consumed using the centralized communication strategy and using either multiple MACs; a single MAC, which is the authentication optimization alluded to above; or digital signatures. Note that the "Linearized BFT" lines represent a substantial improvement over the traditional strategies at all values of f.
On the experimental front, in addition to validating our analytical model of network bandwidth consumption, we are also currently examining the network protocols used for message exchange. We believe that it may be possible to increase the performance of the algorithm at high request rates by moving from UDP to TCP-based transport.
 Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance. In Proceedings of the 3rd Annual Conference on Operating Systems Design and Implementation (OSDI),, pp. 173--186, New Orleans, Louisiana, USA, February 1999.
 Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostic, Jeff Chase, and David Becker. Scalability and Accuracy in a Large-Scale Network Emulator. In Proceedings of 5th Symposium on Operating Systems Design and Implementation (OSDI) December 2002.