CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Announcing What You Know in a Heterogeneous P2P Network is a Good Idea!

Ben Leong & Barbara Liskov

Introduction

Empirical studies on existing unstructured p2p systems have shown that p2p networks are heterogeneous in many aspects. Nodes not only have different resources in terms of bandwidth, storage and computational power, they also impose different workloads on the system --- some are primarily producers, some are primarily consumers, and some straddle the region in between. Most existing p2p routing algorithms however tend to impose a uniform routing state maintenance algorithm on all nodes and the scalability of such algorithms in a heterogeneous environment is thus limited by the nodes with the lowest available bandwidth.

We propose a p2p routing algorithm that removes this limitation by adding flexibility in the amount of routing state that is stored at each node. The nodes with large bandwidths ("big nodes") will store a lot of routing state and are able to route to destinations that they know about in one hop; the nodes with smaller bandwidths ("small nodes") will store less state and may perhaps know very little. Nevertheless, they can take advantage of the routing knowledge of the big nodes to achieve better routing performance, often achieving two-hop routing.

Unlike previously proposed "superpeer" schemes [1, 2], our p2p routing algorithm offers significantly more flexibility by allowing nodes to decide on the exact amount of routing state that they want to store instead of putting them into one class or another. Also, the nodes in our system can dynamically change the size of their routing table in response to changing network conditions and according to their changing needs. Our algorithm offers significantly more flexibility than most existing O(1)-hop DHTs and is an attractive alternative to SmartBoa [3] for heterogeneous networks.

Approach

Like Chord [4], our DHT is organized as a one-dimensional circular address space where each node is assigned a node identifier (id), and the node responsible for a key is the successor. We also use the cryptographic hash function SHA-1 to determine a node's id from its IP address. SHA-1 ensures that with high probability, the node ids do not collide and are uniformly distributed over the entire address space.

Our routing algorithm is based on a mechanism called the Region of Knowledge (ROK). The ROK for a node is simply a contiguous region of address space that contains its id. A node declares its ROK when it first joins the network but it is allowed to change it later. A node knows all the nodes in its ROK and is able to route to them in one hop. In addition, we requre that every node in a node's ROK knows the node. The latter is not likely to be onerous for small nodes because (i) small nodes are not likely to be overwhelmed by a need to maintain a lot of routing state for big nodes, since empirical studies of existing p2p system suggests that real systems have relatively few big nodes; and (ii) small nodes have much to gain in terms of better routing performance by learning about big nodes. Thus, we have the following Network Invariant:

Network Invariant: A node knows about all the nodes in its Region of Knowledge and is conversely known to all of them.

Each entry in a node's routing table consists of an id, an associated IP address, an associated ROK, and sometimes a time-to-live (TTL). There are two kinds of entries: stable entries and transient entries. Stable entries do not have TTLs because changes to them are propagated to a node via multicast as a result of join and leave events. They remain valid until superceded by an incoming join or leave message. We will refer to the set of stable entries for a node as the routing set. Transient entries have associated TTLs and they are created when a node learns about other nodes while performing lookups.

A node that attempts to join the DHT will have to know about some entry-point node and through this node, the new node is able to find its successor/predecessor in the ring. Join messages are forwarded through the successor and predecessor using symmetric ad hoc multicast trees rooted at the source of an event. The messages are sent symmetrically both clockwise and anti-clockwise along the ring.

Our lookup algorithm can be summarized as follows:

Case 1: If id is contained within ROK, send query to the known successor of id.
Case 2: If id is contained within ROK of some known node, send query to the node that has the smallest ROK containing id. (This is to avoid overloading nodes with large ROKs).
Case 3 (rare): If id is not contained within ROK of any known node, send query to the node whose ROK boundary is nearest to id.
When a reply is received, new routing entries are added and the above process is repeated. Since each query is guaranteed to make at least some progress (as long as it does not time out), this process will eventually terminate at the correct node.

When a node receives a query, it will return an answer immediately if the queried id falls within its ROK. If not, it will return the best l entries according to the following order of priority (l a system parameter):

Nodes with ROKs that contain id sorted in increasing size of ROK.
The rest of the nodes sorted according to the distance of their ROK boundaries to id.

Our algorithm is adaptive in two ways: (i) nodes can change the size of their routing tables in response to changing network conditions; and (ii) nodes can adaptively improve the goodness of their routing entries by exploiting information contained in the ROKs.

Routing entries are refreshed periodically when their TTLs expire. Our nodes exploit these refreshments to systematically improve their stored routing state. When an entry is pinged, it returns a list of b nodes with the largest ROKs that contain it (where b is a system parameter). In the worst case, a node will return a list consisting of some of its successors and predecessors.

After each refreshment, a node decides on the set of routing entries that it wants to maintain (usually the nodes with the largest ROKs) and flushes the remaining entries when their TTLs time out. Another reasonable refreshment policy is to probe several prospective candidates and select entries based on proximity to reduce lookup latency [5]. There is no strict policy on what a node can do, as long as it ensures that each address space slice contains at least one entry.

If there are sufficient nodes in the network that have reasonably large ROKs, it is relatively easy for a node to find sufficient routing entries to achieve 2-hop lookup performance: A node simply has to find enough nodes so that the union of their ROKs covers the entire address space. A node can either attempt to achieve full coverage in a laissez faire manner through periodic refreshes, or it can actively probe the network to do so.

To actively find enough entries with ROKs that cover the entire address space, a node determines the segments of the address space that are not covered by the ROK of any entry in its routing table. For each segment, the node sends a query to the successor of the midpoint to ask for the node(s) with ROKs that cover the required segment. This process is repeated until the entire address space is covered by the ROK of some known node. Sometimes it may be too costly to find sufficient nodes to cover the entire address space when ROKs are small, so a node has to stop trying at some point. One possibility is for a node to rate limit its attempts to find nodes to cover empty segments. For example, a node can limit itself to a maximum of 5 such lookups every 5 minutes.

Progress

Our preliminary calculations suggest that our algorithm is scalable and that the overall maintenance costs are relatively low; we are above to achieve one- and two-hop lookup performance even when only a small number of high-bandwidth nodes is available. We plan to run experiments on Planetlab to quantify the effectiveness of our algorithm in practice.

Research Support

This research was supported by the National Science Foundation under Grant No. ANI-0082503 (http://project-iris.net).

References:

[1] Alper Mizrak, Yuchung Cheng, Vineet Kumar and Stefan Savage. Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup. In Proceedings of the 4th IEEE Workshop on Internet Applications, San Jose, CA, June 2000.

[2] Anjali Gupta, Barbara Liskov and Rodrigo Rodrigues. Efficient Routing for Peer-to-Peer Overlays. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI 2004), San Francisco, CA, March 2004.

[3] Jingfeng Hu, Ming Li, Ning Ning and Weimin Zheng. SmartBoa: Constructing p2p Overlay Network in the Heterogeneous Internet Using Irregular Routing Tables. In Proceedings of the 3rd International Workshop on Peer-to-Peer Systems (IPTPS '04), February 2004.

[4] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek and Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. In Proceedings of the 2001 ACM SIGCOMM Conference, August 2001.

[5] K. Gummadi, G. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker and I. Stoica. The Impact of DHT Routing Geometry on Resilience and Proximity. In Proceedings of the 2003 ACM SIGCOMM Conference, August 2003.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)