Abstracts - 2006
Model-based Diagnosis in the Knowledge Plane
Karen R. Sollins & John T. Wroclawski
Consider the following scenario: A user walks into a meeting with a laptop and tries to read email. The user fully expects a set of messages to arrive, but the mail reader reports that mail cannot be read. The user then tries to look her company's corporate web site, to learn whether any services are down. When this also produces a blank page, the user pushes the "Why" button to find out what is wrong, and what might be done about it. In response, the system explains that power in the machine room has failed, and suggests that services should be restored within an hour. The research we address in this proposal is aimed at gaining an understanding of the nature of the problem faced here, devising an extensible and scalable approach to addressing this and related problems of diagnosis, and defining a new architectural subsystem intended to underpin the addressing of such problems.
The diagnosis problem is challenging even if only single failures and small networks are considered. If the mail reader does not find any new mail, we can identify many possible issues including:
The problem becomes even more interesting if we have two trigger queries, and can reason whether the mail problem and browser problem are the same or different. As a research agenda, we can go further by considering the problem of thousands, or hundreds of thousands of simultaneous "why" queries, indicating some distinct and some shared problems identified throughout the net. A challenging question is the extent to which the current Internet architecture is capable of helping us answer these sorts of questions.
The broad objective of our research is to advance the Knowledge Plane  (KP), an architectural construct that may serve as key component of a Future Internet that is introspective, self-managing, and self-diagnosing. The KP incorporates as key elements a framework to collect, locate, and manage knowledge in a decentralized environment, and a framework to support queries, reasoning agents, and inferences about that knowledge. This is the report on a new project being brought up to speed in 2006.
The near-term objective of this project is to design, build, and demonstrate an architecture for distributed, adaptable, extensible network diagnosis. Currently, diagnosis of failures is labor-intensive, does not scale with the increasing scale of the Internet, and cannot adapt gracefully to the evolution occurring in the net. At present, failure diagnosis is very dependent on humans collecting, examining, and analyzing a limited and somewhat ad hoc collection of data. Each person is responsible for only a local part of the overall network, leaving broad scale problems (such as DDoS attacks or widely distributed routing loops) without a single point of diagnosis.
Our proposed research is intended both to provide results of direct value and to inform, guide, and motivate our larger objective of developing the KP. The nature of the research proposed here is both revolutionary in its potential impact on network management and maintainability, and architectural, in that we propose a new general framework that will over time support a broad range of functionality not currently expected of a network architecture. We acknowledge a larger context, with a research approach that considers both technical design principles and the likely central importance of incentive driven decisions about participation and information sharing in any Future Internet.
Our agenda divides the work into four components: (1) the basic agent-based diagnosis architecture, with experimentation with and basic demonstration of its capabilities in a limited, experimental environment, (2) specific work on the key problems in the knowledge aspects of the architecture (algorithmic and protocol), (3) the application of the tools of economics and incentives to provide a design intended for commercial or competitive environments, and (4) extension to and deployment in a large operational testbed, DETER. Beyond evaluating our work in a more realistic setting, this latter environment allows us to extend the system's diagnosis capabilities to new root causes based on DOS attacks and similar security-related matters.
The approach we plan to take in this project is to concentrate initially on a particular example and then learn from and generalize this to larger architectural requirements. We begin by concentrating on the problem of model-based root cause analysis in the extremely simple situation where errors are limited to complete failures of core components of the network. The reason for this choice is two-fold. First, even in this simple situation the problem is difficult, the results useful, and the situation allows us to explore a novel approach that we believe will generalize. Further, because it is a problem domain in which there is prior work, we are able to evaluate our approach by comparison. See the research abstract by George Lee et al., for more details on this first part.
In parallel, we also intend to explore to key aspects of a KP architecture in more depth, the problem of rendez-vous between agents and knowledge, how best to support a wide spectrum of approaches to reasoning over the knowledge, and the necessity of supporting economic and incentive-based participation. In the first case, we are exploring approaches to both distribution of knowledge and distribution of discovery of knowledge, both to support moving knowledge to agents where they reside and to allow agents to find the knowledge where it resides. We believe that a well-designed combination of such approaches will be critically important.
Our second architectural question is how to enable a broad spectrum of approaches to reasoning and inference. This will require understanding the features of reasoning that are important to support and how best to reduce limitations on reasoning made by restrictive architectural decisions. In other words, we require that the support for reasoning be as general purpose as possible, while understanding those limitations that it will impose.
Because we envision in the long-term that the KP will be a key component of a future Internet architecture and therefore will need to be consistent with appropriate economic and policy goals, we will also investigate the implications of the KP from several economic and policy perspectives. These include:
The final part of the effort is to extend our approach to demonstrate its effectiveness in addressing a qualitatively different problem that has been less amenable to current techniques. In particular, we will utilize the unique capabilities of the DETER testbed  to explore our architecture's ability to diagnose DDOS attacks and similar events as the root causes of apparent transmission failures in the net. Finally, utilizing lessons learned from the diagnostic problems, we will be in a strong position to make progress on some of the core problems raised by the proposal of the KP.
This work will make a significant contribution both to the broad networking problem of automated, adaptable, and scalable failure diagnosis, as well as the larger problem of the interactions between security, privacy and other policy constraints on effective operation of an Internet scale network. We expect specific contributions to the subfields of network measurement, network management, including the work on ontologies of failures, and more broadly models of correct and incorrect behaviors of the net, the economics of cooperation in a competitive network universe, and a deeper understanding of the relationship between security and apparent network failures.
More broadly this work fall into three categories: networking, broader computers science, and education. Beyond the core work of a diagnosis architecture, this work will lead the way to a deeper understanding of the challenges of including a Knowledge Plane in a future network architecture. This work in part is based on the utilization of approaches from the AI community, agent systems, reasoning, ontologies, and so forth. Each of those areas will benefit significantly from our exploration and analysis of how best to take advantage of those bodies of work. Finally, in addition to the traditional participation of graduate students funded through this project, the proposers will be contributing significantly to a three-tiered educational and industrial outreach program, based on development and teaching of course material from our work and broader clean-slate network architecture effort in both classical and distance education settings, and including a "DETER-in-a-box" laboratory component.
This project also represents several opportunities for collaboration. In the most direct sense, it is a collaboration between MIT's Computer Science and Artificial Intelligence Laboratory and USC's Information Sciences Institute. In addition, there is significant interest and industrial collaboration opportunity, especially at present through industrial membership in MIT's Communications Futures Program, where specific interest in the Knowledge Plane is under discussion.
 David Clark, Craig Partridge , J. Christopher Ramming, John Wroclawski, A Knowledge Plane for the Internet. In Proc. ACM SIGCOMM'03, Karlsruhe, Germany, August 2003.
 Terry Benzel, Robert Braden, D. Kim, B. Clifford Neuman, Anthony Joseph, K. Sklower, R. Ostrenga, S. Schwab, Experience with DETER: A Testbed for Security Research. In 2nd IEEE Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (TridentCom2006).