CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Exploiting Autonomous System Information In Structured Peer-to-Peer Networks

Ji Li & Karen Sollins

Introduction

In recent years, overlay networks have attracted considerable attention from both the academia and the industry. Many applications and network services on overlay networks rely on active probing using ping or traceroute to measure path quality and to detect anomalies. As overlay networks grow in popularity and sizes, this may lead to a significant increase in network traffic. Furthermore, it is hard to obtain the desired information, such as network topology, since end-to-end approaches can only measure paths among those participating nodes. Therefore, we believe that there should be a more systematic approach to obtain the underlying routing and topology information.

The Internet consists of thousands of distinct regions of administrative control called Autonomous Systems (ASes). ASes contain abundant routing and topology information. We believe that it is of great value to expose AS-level routing and topology information to network services and applications, because overly networks can exploit this information to improve performance and reduce traffic. Such information may also enable new functionality. In the following discussion, we assume that AS information is already available to overlay networks. Then the question is: given AS information, how can we improve overlay networks?

Methodology

In this work, we demonstrate the use of the AS-level routing and topology information to improve structured P2P networks with two schemes for lookup and replication, respectively.

A hybrid proximity neighbor selection

We propose using the AS-path length as a proxy for network latency to filter out unlikely candidates without probing them during proximity neighbor selection (PNS), and only those who pass the filtering will be probed. Our hybrid scheme consists of two steps. First, when a node receives several candidates to fill in a routing table entry, it does not probe all of them directly. Instead, it first retrieves the AS-path length between itself and each candidate. Then it uses the AS-path lengths to filter out those that are unlikely to be close, leaving a much reduced set of candidates. The way of filtering candidates is defined as a filtering function. Second, the present node probes the remaining candidates to find the nearest one from them. The closest node found is used for the corresponding routing table entry.

There is a tradeoff between the lookup performance and the amount of probing traffic generated by filtering functions. This provides the flexibility for each node to decide the effort it wants to take to improve the lookup performance, considering its bandwidth restriction and desired accuracy. In addition, there are several complications that make the AS-path length not a completely accurate proxy for latency. We discuss further improvements in the filtering function considering the heterogeneity of ASes and network dynamics in [2].

A flexible replication scheme.

Our approach improves replication in P2P systems in two ways. First, we separate replica placement from replica lookup using indirection. In our scheme, replicas are not stored in the successors; instead, they are stored in nodes chosen by the owner node. The successors only store a list of IP addresses of nodes that store the replicas. By separating the placement of replicas from the successors, the actual replica-storing nodes can be chosen flexibly by the owner.

Second, we use AS information for a P2P network to make informed decisions on replica placement. The goal is to reduce data access latency and network overhead, and to adjust the number of replicas and replica placement according to data popularity. We design a heuristic replication scheme as follows. Initially, there are a fixed number of replicas for each piece of data. That provides a basic level of reliability. Our strategy is to place replicas on nodes in different ASes that have high AS out-degrees. The intuition behind this is that a high-degree AS has good connectivity, and are usually closer to many other ASes; therefore they are a better choice for replica placement. By placing replicas in different ASes, we try to evenly distribute the replicas in the Internet.

When the request rate reaches a certain level, additional replicas need to be created dynamically. The problem is where we should place the additional replicas. Since ASes provide good scoping information, our clustering technique is to map clients to their AS numbers. If many clients are located in the same AS, we can place the additional replica on a peer in that AS, especially on one who has issued a lookup for the data recently, since it already has a copy of the data, and is likely to be willing to keep the copy. Also, it is fair to put replicas on those who consume the data. In this way, we can reduce access latency in the future. The details are in [2].

Performance Evaluation

We simulate a Pastry-like structured P2P network. Our simulation is configured in the same way as Pastry, with keep-alive messages for leaf sets and lazy routing table repairs. We evaluated our schemes on both synthetic Internet topologies generated by the BRITE Topology Generator and a topology generated from the RouteViews data and Skitter data. In the simulation, by using our hybrid proximity neighbor selection scheme, we achieved nearly the same lookup performance as the standard proximity neighbor selection with only 8% of probing messages on the synthetic topology, and 20% longer average lookup latency with only 12% of probing messages on the real topology. Our simulations also show that the static replication based on AS degrees outperforms the traditional approach by about 10%-20% in terms of data access latency of the nearest replica, and the gains increase with network size. The details of the evaluation is in [2].

Conclusion

In this work we have examined and evaluated algorithms for improving both lookup and replication in structured P2P networks using AS-related information. We demonstrate that there are significant advantages to our approach. The first is the significantly reduced probing traffic and performance improvement based on simple and static AS information. Perhaps even more importantly, the AS infrastructure reflects administrative boundaries in the Internet. Algorithms such as those presented in this paper allow for restricting activities to either individual or sets of ASes that reflect such boundaries. In the simulation, our approach naturally shifts P2P traffic with respect to AS boundaries. We conclude that with more study, one can generalize these ideas more broadly to the Internet layered architecture, with significant performance and policy benefit. This idea is discussed further in the Regions Project [1].

Acknowledgments

We would like to thank the ANA members for valuable suggestions. This work is funded in part under NSF Grant ANIR-0137403, "Regions: A new architectural capability in networking", and NSF Grant CCR-0122419, "The Center for Bits and Atoms". It is also funded in part by a gift from the Cisco University Research Program.

References

[1] K. Sollins. Designing for Scale and Heterogeneity. In ACM SIGCOMM FDNA 2003 Workshop, Karlsruhe, August 2003.

[2] Ji Li, Karen Sollins. Exploiting Autonomous System Information in Structured Peer-to-Peer Networks. In The 13th IEEE International Conference on Computer Communications and Networks (ICCCN2004), Chicago, IL, October 11-13, 2004.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)