CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Regions: A New Capability for Networks

Karen R. Sollins

Introduction

Naive pictures of the Internet frequently portray a small collection of hosts or LAN's connected by a "cloud" of connectivity. The truth is more complex. The IP-level structure of the Internet is composed from a large number of constituent networks, each of which differs in some or all of transmission technologies, routing protocols, administrative models, security policies, QoS capabilities, pricing mechanisms, and similar attributes. On top of this, a whole new structure of application-layer overlays and content distribution networks, equally diverse in the sorts of ways mentioned above, is rapidly evolving. Virtually any horizontal slice through the current Internet structure reveals a loosely coupled federation of separately defined, operated, and managed entities, interconnected to varying degrees, and often differing drastically in internal requirements and implementation. Intuitively, it is natural to think of each of these entities as existing in a region of the network, with each region having coherent internal technology and policies, and each region managing its interactions with other regions of the net according to some defined set of rules and policies.

The thesis of this project is that this basic, general notion of a region is a powerful tool for managing the increasingly complex demands on the Internet and its successors, and thus should made into an explicit, first-class component of the network architecture. We suggest that the many uses of the region concept share a well-defined set of core ideas and operations, and that it is therefore useful to separate the concept of "region" from any specific use of the concept and provide it as an independent, reusable abstraction. We further suggest that the region abstraction can be implemented as a concrete software entity, and that doing so will provide protocol designers and implementers at all levels of the stack with logically structured, scalable, high-performance access to an important class of functionality.

The Approach

The region is a generalization of a group and a bound in some space. The two defining concepts of a region are a set of contained members, which share some common characteristics, and a boundary, which allows us to capture the notion of actions taken when entering and leaving the region. Each characteristic defines a set of values defined over an attribute. As a result, we can view a region as a subspace of the multidimensional space defined by the set of attribute types used for this region. This allows for a formal definition of the boundary of a region. In addition to our formalism, the implementation of the region class permits us to provide self-adaptation to changes in size or usage patterns, allowing the same abstraction to apply under widely differing demands on scalability, performance, and efficiency. We find these ideas useful in many places in networking to address problems such scaling, distribution, efficiency, heterogeneity, cost recovery, and many others. We have proposed in this research to define and develop the region abstraction and explore its utility as a general mechanism in networking. To do this each student has studied or is studying an aspect of the region concept.

Interlayer information control: In this research, Ji Li studied regions as the basis for communication among regions of differing sorts. In this case, we considered two types of regions, Autonomous Systems or ASs, used as the basis for routing at the IP layer in the Internet, and overlay networks that do their own routing at the application layer. The question we addressed in this research was whether we could use information from the lower level regions, ASs, to inform and improve routing in the higher level regions, or overlay networks. The performance problem that generally has been an issue for structured or DHT-based peer-to-peer systems is latency. Such a P2P system initially places its content even distributed around the network. If we could use the DHT system to look up information that had been placed more efficiently, latency could be reduced significantly. The common approach that has been taken to achieve this is for each P2P node to learn the latency to each of its peers, initially by use of "ping", and in some cases now using beacons conveniently placed in the network. By approximating latency using AS-path (the path or length of the path in terms of ASs) we find that we have significantly reduced two factors, network traffic for learning about latency and amount of storage required in the nodes. The tradeoff is that although we improved performance significantly, it was not quite as good as with the mechanisms that reside only in the application layer. We are still convinced by the engineering tradeoff, that there is significant value in utilizing lower layer information effectively, and that regions allow us to provide a valuable abstraction for this.
Garbage Collection: Since there will be many regions and an entity may participate in more than one, and additionally the implementation of a region may be distributed, Clyde Law took on the question of whether and if so where garbage collection would be necessary. The problem for which garbage collection seemed most important is within a single region, specifally, if the implementation of the region is distributed. For the semantics of some of the region functions to be correct, it will be important that the distributed components that comprise the infrastructure of a region be as consistent with each other as possible. In part, that means that it will be important for each such component to know when a member of a region has departed from that region, either simply because it is no longer part of the region or because the element itself no longer exists. In this thesis, Law explored appropriate algorithms and built a proof-of-concept implementation.
Security at the boundaries: One question that we considered to be particularly important was whether the concept of the boundary could provide a security perimeter as well as something more abstract. One of the problems that network and security managers face is that distributed firewalls are extremely fragile and difficult to manage. Joshua Baratz took on the question of whether region boundaries might not provide a more manageable abstraction than distributed firewalls. A key component of this problem space is how to guarantee that traffic destined to travel between source and destination can only reach the destination by passing through an appropriate component of the boundary. To this end, he built a proof-of-concept implementation and found that the region abstraction because it can be viewed at the application layer is significantly easier and therefore more manageable. With the appropriate certificate requirements and management one can guarantee that traffic will handled correctly.
Adaptability of Region Membership in the Face of Anarchy: Rob Beverly took on the question of adaptation of membership in regions. The issue is how members might cluster themselves effectively into regions in order to achieve their tasks with most utility to themselves. The approach taken concentrates on the fact that in an economically viable environment it is hard to believe that altruistic superpeers, as proposed for Gnutella will survive. Instead, we can ask whether an unstructured system such as Gnutella can self-organize into useful clusters based exclusively on anarchic evaluations of utility to each peer. The answer demonstrated in this project was that one could do that, as measured by widely accepted criteria. As a demonstration and validation of his work, Beverly collected Gnutella data, considered the utility of a peering relationship between two nodes based on their mutual interests as identified by which files they already contained (through analysis of files names, where available) and their queries.
Region Boundaries: As suggested above, we have explored particular uses of region boundaries, but Neha Singh set out to study them more generally. The questions one needs to ask here are 1) how does one figure out which region to visit, 2) how does it find the regions, and 3) what happens at region boundaries that is distinctive and important. For purposes of this study Singh worked in a mobile agent system in which agents develop an itinerary based on their objectives, in order to determine which regions may provide the services they need to reach their objectives. We chose an agent system as a generalization of other more limited forms of mobile entities that might be moving through the region system, such as packets, other data, or pure code such as applets. In this case, each member of a region advertises to the region its functions constraints. In turn the region summarizes this information as well as its own constraints, and advertises them to a region directory. Thus, the information made widely available is only approximate for each region. The agent then is given its objectives, to be compared with the contents of the region directory to determine candidate regions to visit and an ordering on them. By this means an itinerary is defined for the agent in terms of URNs or globally unique identifiers for the regions, each of which can be translated into an IP address when needed. At any time during the travels of the agent this itinerary can be modified or updated. With the itinerary formed the agent moves to the entry point of the first region on its itinerary. There it determines whether an exact match can be made within the region, as well as any determination about authentication, authorization and payment policies. The agent may make one or a number of stops within the region and may recompute its itinerary at any time. It will then move on to another region, possibly stopping again at the region boundary for final reconciliation wiht the region on its way out. This study has explored how to support scaling (through nested advertising and hence nested decision-making about suitability), as well as the functions of policy and pricing models at and within the boundaries of regions. The thesis will be completed in May 2005.
Supporting Efficient Functions in Ad Hoc Networks: Apostolos Fertis is exploring the use of regions to improve the performance of functions in ad hoc wireless networks. One of the key features of an ad hoc wireless network is that it can reorganize itself dynamically. Typically these networks are organized into clusters in order to reduce either network transmissions or power consumption, which are closely related. In this research an additional criterion is used, clustering in order to improve performance of functions over subsets or regions of the ad hoc network. The first step is to determine an efficient partitioning of the function, in order that more isolated subsets of nodes can be accessed. Thus, for example, an averaging function can be partitioned into a series of paired functions for the count and sum of a subset. These can be merged, by summing the sums and summing the counts, up through the nested regions until the root of the nesting is reached, at which point the sum of the sums can be divided by the sum of the counts. A function such as this can be partitioned to any desired depth, down to pairs of nodes. In order to create the structure of subregions, Fertis has devised an algorithm which we believe has significantly improved performance characteristics to those in existence and can be performaed in a distributed fashion. In addition, he intends to apply several addition performance improvements. This thesis will be completed by June, 2005.

Dissemination of Results

All these projects have resulted in or will result in theses submitted to the Department of Electrical Engineering and Computer Science at MIT. In addition, there have been a number of published papers on these and related topics. See http://krs.lcs.mit.edu/regions for the theses and papers. For a more complete introduction to regions, see [1]

References:

[1] Karen R. Sollins. Designing for Scale and Differentiation. In Proc. ACM SIGCOMM 2003 Workshop on Future Directions in Network, Karlsruhe, Germany, August 2003.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)