CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Byzantine-Fault-Tolerant Replication in Dynamic Systems

Rodrigo Rodrigues & Barbara Liskov

Introduction

Today we are more and more dependent on Internet services, which provide important functionality and store critical state. These services are increasingly targets for attack; e.g., the number of reported incidents has doubled every year since 1997 [1]. In addition, the systems are becoming very large, e.g., the Google search engine uses a cluster of ``thousands of low cost PCs'' [2]. Furthermore they are long-lived and need to allow for reconfiguration to deal with machines that break or are decommissioned and must be evicted from the system, to replace failed nodes with new machines, or to add machines to the system for increased storage or throughput. Thus the systems need to continue to function even though system membership is changing dynamically.

Implementing systems that are large-scale, long-lived, and tolerant of malicious attack is a hard problem. Of particular concern is tracking and responding to membership changes in an appropriate way. Tracking system membership correctly requires resolving two conflicting goals. On the one hand, membership changes are a threat to security since they may alter the proportion of correct and faulty servers; this implies we might like manual control. On the other hand, operator errors have been shown to be a major cause of disruption in computer systems [45], and therefore we would like reconfiguration to be done with minimal human intervention.

Our research is aimed at developing a complete solution for tracking and responding to membership changes in large-scale replicated services. We focus in particular on Byzantine-fault-tolerant (BFT) replication since it enhances the security, availability, and reliability of fault-tolerant services when servers may fail arbitrarily.

Approach

Our approach consists of a membership service (MS) and a technique for transforming replication algorithms designed to work in a static system into algorithms that work in a dynamic setting.

Our approach is designed to work in a very general setting. We allow large numbers of servers (e.g., hundreds of thousands) that together are able to store vast quantities of data on behalf of an even larger number of clients. Responsibility for handling different data items is partitioned among the servers: Each item is stored at a group of servers, and different items can be stored at different groups. The number of servers in the group depends on the kind of replication being used; for example, 3f + 1 servers are needed to survive f Byzantine failures in the BFT replication protocol [3]

The MS accepts requests to add and remove servers and it also probes servers and automatically removes those that do not respond properly. Periodically it publishes a new system membership; in this way it provides a globally consistent view of the set of available servers, thus allowing clients and servers to make consistent local decisions about which servers are currently responsible for which parts of the service.

The membership service must itself be tolerant of Byzantine failures. We have developed a way to implement it as a BFT group, where the group itself can be reconfigured. This allows continued correct behavior without needing to assume a bound on the number of replica failures over the lifetime of the system. Reconfiguration is accomplished by superimposing the MS service on the system nodes, and moving it to a new set of nodes periodically.

Our approach also includes a methodology for how to transform static replication algorithms into algorithms that handle membership changes. The new algorithms use the input from the membership service to determine when to reconfigure. We illustrate the methodology by describing a dynamic Byzantine quorum system that supports read and write operations.

Progress

We have completed a design and implementation of the MS and also of two example replicated services. Our performance studies show that the membership service is scalable, and the replicated services perform well, even during reconfigurations.

Moving a replica group (either a group running the MS, or a group carrying out the work of the Internet service) raises the question: what is the likelihood that a selected group will contain more than f faulty nodes? We have completed an analysis that shows that the probability of choosing a bad group is low for reasonable values of f provided the system as a whole contains only a small fraction (e.g., 1%) of faulty nodes.

Research Support

This research is supported by the NSF under grants ANI-0082503 (http://project-iris.net) and 6896772.

References

[1] CERT/CC Statistics 1988-2004. http://www.cert.org/stats/cert_stats.html, 2004.

[2] Google Technology. http://www.google.com/technology, 2004.

[3] Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance. In The Proceedings of the 3rd Annual Conference on Operating Systems Design and Implementation (OSDI), pp. 173--186, New Orleans, Louisiana, February 1999.

[4] Jaeyeon Jung, Emil Sit, Hari Balakrishnan and Robert Morris. DNS Performance and the Effectiveness of Caching. In The Proceedings of the ACM SIGCOMM Internet Measurement Workshop, San Francisco, California, November 2001.

[5] D. Oppenheimer and Archana Ganapathi and David A. Patterson. Why do Internet services fail, and what can be done about it? In The Proceedings of the 4th USENIX Symposium of Internet Technologies and Systems (USITS), Seattle, Washington, March 2003.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)