Abstracts - 2007
The Role of Learning in Network Architecture
Robert Beverly & Karen Sollins
Learning and Internet Architecture
A central question systems architects face routinely is how to abstract, modularize and best place functionality. While many traditional communication networks place the bulk of the system functionality in the center of the network, the Internet is unique in pushing protocol functionality as far to the edge as possible. First articulated by Saltzer, Reed and Clark as the "end-to-end argument" , these design principles are part of several architectural enablers identified in retrospective analyses of the Internet's success.
The modern Internet faces problems of scale, complexity and security while accommodating increasingly critical services. As the network matures, faces new challenges and offers new functionality, the end-to-end arguments continue to provide guidance. A crucial consideration is how the Internet architecture should evolve to accommodate future demands.
The network's ability to configure itself, recover from and defend against attacks, and provide new services may require fundamental changes to the underlying architecture. For instance, the Knowledge Plane  argues for a distributed cognitive system architecture. Such an intelligent network could perform diagnostics, recover from failures and stave off malicious nodes.
This research considers where, if at all, cognitive intelligence should be placed in the network. We present several real-world problems that are best solved via learning. A hard problem of our work is finding means to limit and decompose network problems such that a statistical approach is tractable. In motivating alternate designs, we appeal to the end-to-end arguments to guide our decision of where to place intelligent functionality. The problems we examine span the design space continuum between placing learning in end nodes [3, 4] to pushing increased intelligence into the core of the network.
Intelligence in the Routing Plane
A particularly challenging environment for learning is within the core of the routing plane. A commonly cited failing in the current Internet routing architecture is the lack of separation between reachability and policy. As a result, operational networks typically attempt to drive behavior with the blunt mechanisms available. For example, autonomous systems are competing yet cooperating business entities that must implement policy based on business relationships. Significant low-level configuration is required to effect policy in an attempt to accommodate the incentive structure. Furthermore, the resulting configuration is complex, fragile and prone to pathologies and non-obvious failure modes .
To accommodate the aforementioned tussles, a new routing architecture should have a high-level notion of how well it is performing. This performance can be thought of as a multi-dimensional optimization problem in the sense of a utility function or reward. Reward is both immediate in the form of local knowledge about links and delayed in the form of feedback from prior decisions. Routing then becomes a distributed optimization process that maximizes an individual ASes notion of utility. To accommodate the aforementioned tussles, a new routing architecture should have a high-level notion of how well it is performing. This performance can be thought of as a multi-dimensional optimization problem in the sense of a utility function or reward. Reward is both immediate in the form of local knowledge about links and delayed in the form of feedback from prior decisions. Routing then becomes a distributed optimization process that maximizes an individual ASes notion of utility.
Routers have both fixed information and dynamic information. Fixed information includes the set of links, routers, negotiated contracts, users, etc. Dynamic information includes traffic levels, congestion, packet loss not just for the AS, but for destinations outside of the AS. Because of the non-deterministic, non-stationary nature of Internet routing, the optimization problem necessarily involves learning: notably exploring routing paths and predicting forwarding decisions expected to maximize long-term utility.
Rather than complicated error-prone router configuration, consider a highly expressive language stating policy as a higher level abstraction. Such a global policy could be distilled into a configuration for the routing plane to implement. With different policies, the network could easily accommodate many objectives, including automatically routing around faults or minimizing packet loss. In this way, the network is automatically adaptable, efficient, reconfigurable and resilient in a way that is not currently possible.
The notions of non-deterministic reward and delayed feedback for learning policy and making predictions corresponds directly to ideas from the reinforcement learning community. Routing becomes a learning problem, where the agent begins learning by probabilistically choosing an egress interface for an incoming packet. The agent maintains state and receives feedback about the success of each routing decision, in the form of reward, in order to substantiate better future decisions.
Using reinforcement learning for routing is not a novel idea e.g. . However, prior proposals have all been infeasible in terms of computational or communication cost, memory requirements, stability or convergence. In contrast, we take select ideas from reinforcement routing and place learning functionality at specific places in the network according to sound end-to-end design rules. A primary contribution we make is the recognition reinforcement routing to solve many tussles traditionally exogenous to the routing system.
To this end, we show the natural and strong link between such a routing infrastructure and learning. Our hope is that this work serves as a basis for demonstrating the feasibility and importance of an intelligent routing plane as a component of future network architectures.