CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

A Knowledge Plane for the Internet: Can a Network Know What it is Doing?

David D. Clark, Karen R. Sollins, Peyman Faratin, John T. Wroclawski & George Lee

Introduction

One of the Internet's greatest strengths is that it does not know or care what its applications are or what they are doing: it simply forwards data. Yet network users experience the network through the functioning and performance of applications. This divergence of perspective leads to a number of problems. For example, a user whose local DNS service has failed may perceive the network as broken, even though from a network perspective, data continues to flow correctly. If an email server or a Web server fails, the user will say the network is broken; the network operator will say the network is fine.

We need a way to make the network more aware of itself and its applications, without destroying the open and transparent data plane. To meet this need we propose the creation of an Internet knowledge plane. The knowledge plane is a distributed and decentralized construct within the network that gathers, aggregates, and manages information about network behavior and operation, and provides an integrated view to all parties (operators, users, and the network itself). The goal is to enlarge our view of what constitutes the network to match the intuition of a user, and to enhance our ability to manage the network intelligently, without disturbing the open and unknowing forwarding plane.

The knowledge plane is intelligent: it can reason about the network’s behavior and act upon the results of its reasoning. It can remember and learn from past behavior. To achieve that goal, we propose to adapt and employ recent work in cognition such as the separation of algorithm, policy and goals, and new models for knowledge representation.

What might the knowledge plane be good for? Here are two examples, diagnosis and prediction, to illustrate.

The Problem of Diagnosis

When some part of the Internet fails, it is almost impossible for the end user to tell what has happened, to figure out who should be notified, or what to do to correct the fault. Imagine a program (we call it the why program) that a user can run when something about the network or a networked application seems to be broken. The why program starts with a component that runs on an end node, and performs diagnosis when there is a failure. The diagnostics can check out functions at all levels, from packet forwarding to application function.

Once the end node has performed what diagnosis it can, the next stage is for the tool to add assertions to the shared knowledge plane about what it has discovered, and ask the knowledge plane for relevant information. This contribution to the knowledge plane allows all the users on the network collectively to build a global view of network and service status. Using this information, this tool would give the user an explanation of what had gone wrong in terms that are meaningful to the user, and also information to the network operator in his terms. Network operators have the option of adding additional facts to the knowledge plane about known failures; in the ideal, a user who trips over a problem might not just get back diagnostic information, but information from the provider about when the problem will be resolved.

The Approach to Diagnosis

Communication networks today do a good job of allowing a wide variety of devices to communicate with one another, but when something fails it can be difficult to determine the cause. There has been a great deal of research into specific diagnostic methods for detecting network problems such as denial of service (DoS) attacks, Internet worms, BGP misconfiguration, and so on, but no common infrastructure exists on the Internet today to automate such diagnosis. Therefore the goal of our research is to develop a scalable architecture for network fault diagnosis that supports the use of diverse diagnostic and data collection systems for the automated diagnosis of a wide range of network failures. We hope that this architecture will also encourage the development of new data collection and diagnostic systems and create a competitive market for data and diagnoses, enabling choice and promoting accuracy and efficiency. This system will allow end users to request diagnoses and receive prompt, accurate, and useful information about their problems.

Some of the challenges of such a system are developing scalable protocols for communicating data and diagnoses, defining an ontology for representing data and making queries in this architecture, and integration with existing data collection and diagnostic systems. We plan to implement this architecture on a testbed such as PlanetLab to evaluate the ability of this architecture to support different diagnostic and data collection systems.

The Problem of Prediction

Imagine a world in which we move with a small suite of wearable devices through a rich and varying cyberspace. The set of resources available to us and our wearable devices may depend on what is near by, what we have permission and access capabilities to reach, who else may be using resources and so forth. As we move through space, executing a web of sophisticated "applications", one or more of them may need to reorganize itself to either to continue operation at all or to improve operation.

In that sort of environment, perhaps the most challenging problem we face is understanding not only the current demand on resources, but predicting availability of resources in the future. When one considers this environment, the network resources are probably the most broadly shared, and hence their predictability may be the most challenging. The question that faces us at any of those reorganization points is what one can expect from the network connecting the set of devices and services one wants to utilize and some prediction of future behavior, at least probabilistically. This question of prediction is a challenge for the Knowledge Plane to learn presumably from previous behaviors and any underlying models and constraints, in order to provide reasonable predictability.

The Approach to Prediction

To this end we have proposed a specific project that both takes advantage of and supports the knowledge plane. The high level problem is how to develop, maintain, and evolve a base of knowledge about network behavior at the local level in order to support the decision-making required by adaptive applications. To achieve this, we identify a system composed of five parts:

Requirements modeling includes identification of what is necessary and desired for network resources to support a user running an application or set of applications. Such a model may be based on description, where an application writer or user specifies requirements or constraints, and/or observation, where a user’s preferences are learned dynamically during application execution.
Discovery of candidate resources involves learning not only about available resources, but also their useable capacities, including not only current availability of network resources, but also a prediction into the future. To support an effective choice, one wants to make configuration decisions that have at least a high likelihood of stability over time. There a spectrum of predictability from guaranteed (as by a reservation) to some probabilistic model about the relationship between any current metrics and their future behavior. We envision various components here to come from existing work, suchas network monitoring and characterization tools. There is likely to be significant further work, because of our need for at least short range future prediction.
Selection is a reflection of reasoning over the requirements and possible candidate resources to make the most desirable choice.
Evaluation of the selection is important because once a configuration has been chosen, with a prediction of a certain behavior of the network, it will be important to monitor that behavior and learn from such an evaluation. Such input may be valuable in future decisions either on behalf of this application or others and will certainly provide further input into the Knowledge Plane itself.
Evaluation of overall stability of the network in light of the selection is key to a coordination feedback loop. If there are resources available in the network, their use by components of an application will change them. Both networks and applications adapt to resource limits: networks through routing, congestion control, etc., and applications through recoding, restructuring, and so on. To these two, we expect to add a third possible adaptation vector, allowing the user to dynamically change performance and behavior preferences. The (long-standing) challenge is to ensure overall stability of the composed system of loosely coupled adaptation schemes, while compromising the independence of network and application adaptation as little as possible.

A key feature of this system will be its ability to be tuned to the needs of particular situations by individualizing the tradeoff between cost and accuracy. More broadly, the system will manage network usage in a significantly more coordinated way than at present, by facilitating effective sharing and coordination among collected information, and integration across different sorts of measurements, management, and prediction techniques.

The Architecture of the Knowledge Plane

In parallel with examining these two specific uses of the Knowledge Plane, we are also considering the more general architecture for it. At present, we take the position that we can separate the Knowledge Plane itself (the body of knowledge and understanding about the network) from the particular uses of it. That said, it will be necessary for the clients or users of the Knowledge Plane to help drive or constrain reasoning to those directions that are productive. Above we have described two specific projects, but the overall project is also examining the architecture of the Knowledge Plane itself and to that end we identify four key component: an ontology for the knowledge, a reasoning framework, a transport or diffusion model, and an incentive model.

In determining the nature of an ontology, one must consider the nature of the specification of the ontology. At one extreme, an ontology is pre-specified formally and only allows for extensibility within the bounds of the ontology itself. At another extreme, an ontology is discovered in real time through analysis based on a simpler and more low level capability for describing entities (either self-describing or learned from behaviors). This latter approach is significantly less constraining, which has both advantages and disadvantages. In addition to our currnet survey of existing work in this area, we expect the two example case studies to further inform us on a reasonable approach and design and engineering tradeoffs.

The second issue is reasoning, the development of new beliefs from pieces of information or knowledge as they become available. To date, our efforts in this direction have been to explore the existing reasoning technologies. There are two challenges that the Knowledge Plane brings to reasoning technologies; scale and complexity. The scale problem is that the size of such a knowledge base may grow to signficant amounts of knowledge about every element within and attached to the network. Each of these elements may provide a distinct set of services at each layer of abstraction in the network. The complexity of networking takes the form of a web of elements that interact in ways so complex that we cannot always predict how this “system” will function under various conditions. In other words, we do not have, nor do we know how to create, a complete model of the network. In this context of scale and complexity, it will be mecessary to reason about facts received into the knowledge plane, to either discover contradictions and impossibilities or new beliefs and facts.

An ontology and reasoning capability are useless in this environment without a clear model of how elements of knowledge are brought together in order to reason about them. We call the points of reasoning think points and ask how to transport them to the appropriate think points. For transport, we propose an attroctor model, including an advertising model for think points and a matching transport mechanism that moves elements of information or knowledge toward think points that have advertised an interest in relevant topics. This is a combined push-pull model, rather than choosing one or the other extreme of push only or pull only, and is intended to allow for flexibility in both design and engineering tradeoffs to achieve improved functionality as well as performance.

Last, there is a question about incentives, which is really a question of value and who receives what value. It therefore is a question of motivation. For example, is the value of the Knowledge Plane to the individual worth the tradeoff against privacy, more work, imperfect accuracy in responses, load on the network, and so forth? There is also a question of the incentives for supporting the core of the Knowledge Plane. Are there parties to whom it will be worth enough to support its existence? We believe that there can be significant value to many parties, the ISPs, service providers, and application users. As the two examples above indicate, we believe there will be value in improving management, usage, and efficiency of network utilization, as well as more tunable and predictable usage from the perspective of end nodes, whether servers or applications. More importantly, the project will consider the influence of economic choices on the design of the Knowledge Plane itself.

In summary, this is an effort to explore the design and feasibility of a plane of knowledge for the network that allows it to be introspective, self-managing, self-healing and increasingly supportive of the needs of applications. For a longer, although less recent description of the Knowledge Plane see Clark et al. [1]

References

[1] David Clark, Craig Partridge, J. Christopher Ramming, John Wroclawski. A Knowledge Plane for the Internet. In Proceedings of ACM SIGCOMM, pp. 3--10, Karsruhe, Germany, August 2003.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)