CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Detecting BGP Configuration Faults with Static Analysis

Nick Feamster & Hari Balakrishnan

Problem Statement

Network operators use router configurations to provide reachability, express routing policy (e.g., transit and peering relationships [13], inbound and outbound routes [2], etc.), configure primary and backup links [8], and perform traffic engineering across multiple links [7]. Configuring a network of BGP routers is like writing a distributed program where complex feature interactions occur both within one router and across multiple routers. This complex process is exacerbated by the number of lines of code (we find that a 500-router network typically has more than a million lines of configuration), by configuration being distributed across the routers in the network, by the absence of useful high-level primitives in today's configuration languages, by the diversity in vendor-specific configuration languages, and by the number of ways in which the same high-level functionality can be expressed in a configuration language. As a result, router configurations are complex and faulty [2,12]

Faults in BGP configuration can seriously affect end-to-end Internet connectivity, leading to lost packets, forwarding loops, and unintended paths. Configuration faults include invalid routes (including hijacked and leaked routes); contract violations [6]; unstable routes [11]; routing loops [4,5]; and persistently oscillating routes [1,9,14]. We find that rcc can detect many of these configuration faults.

Approach

First, we define two high-level aspects of correctness-path visibility and route validity-and use this specification to derive constraints that can be tested against the BGP configuration. Path visibility says that BGP will correctly propagate routes for existing, usable IP-layer paths; essentially, it states that the control path is propagating BGP routes correctly. Route validity says that, if routers attempt to send data packets via these routes, then packets will ultimately reach their intended destinations.

Second, we present the design and implementation of rcc. rcc  focuses on detecting faults that have the potential to cause persistent routing failures. rcc is not concerned with correctness during convergence (since any distributed protocol will have transient inconsistencies during convergence). rcc's goal is to detect problems that may exist in the steady state, even when the protocol converges to some stable outcome. To date, rcc has been downloaded by over 65 network operators.

Problem Latent Benign
Path Visibility
Dissemination Problems
Signaling partition:
   - of route reflectors 4 1
   - within a RR "cluster" 2 0
   - in a "full mesh" 2 0
Routers with duplicate:
   - loopback address 13 120
iBGP configured on one end 420 0
or not to loopback
Route Validity
Filtering Problems
transit between peers 3 3
inconsistent export to peer 231 2
inconsistent import 105 12
eBGP session:
   - w/no filters 21 -
   - w/undef. filter 27 -
   - w/undef. policy 2 -
filter:
   - w/missing prefix 196 -
policy:
   - w/undef. AS path 31 -
   - w/undef. community 12 -
   - w/undef. filter 18 -
Dissemination Problems
prepending with bogus AS 0 1
originating unroutable dest. 22 2
incorrect next-hop 0 2
Miscellaneous
Decision Process Problems
nondeterministic MED 43 0
age-based tiebreaking 259 0

Table 1: BGP configuration faults in 17 ASes.

Figure 1: Number of ASes in which each type of fault occurred at least once.

Third, we use rcc to explore the extent of real-world BGP configuration faults; this paper presents the first published analysis of BGP configuration faults in real-world ISPs. We have analyzed real-world, deployed configurations from 17 different ASes and detected more than 1,000 BGP configuration faults that had previously gone undetected by operators.

These faults ranged from simple "single router" faults (e.g., undefined variables) to complex, network-wide faults involving interactions between multiple routers.

Table 1 summarizes the faults that rcc detected. Figure 1 shows that many faults appeared in many different ASes. We did not observe any significant correlation between network complexity and prevalence of faults, but configurations from more ASes are needed to draw any strong conclusions. The rest of this section describes the extent of the configuration faults that we found with rcc.

Results

Although rcc is actually intended to be used before configurations are deployed, rcc discovered many faults that could potentially cause failures in live, operational networks. These include: (1) faults that could have caused network partitions due to errors in how external BGP information was being propagated to routers inside an AS, (2) faults that cause invalid routes to propagate inside an AS, and (3) faults in policy expression that caused routers to advertise routes (and hence potentially forward packets) in a manner inconsistent with the AS's desired policies. Our findings indicate that configuration faults that can cause serious failures are often not immediately apparent (i.e., the failure that results from a configuration fault may only be triggered by a specific failure scenario or sequence of route advertisements). If rcc were used before BGP configuration was deployed, we expect that it would be able to detect many immediately active faults.

Conclusion

In light of our findings, we suggest two ways to make interdomain routing less prone to configuration faults. First, protocol improvements, particularly in intra-AS route dissemination, could avert many BGP configuration faults. The current approach to scaling iBGP should be replaced. Route reflection serves a single, relatively simple purpose, but it is the source of many faults, many of which cannot be checked with static analysis of BGP configuration alone [10]. The protocol that disseminates BGP routes within an AS should enforce path visibility and route validity; the Routing Control Platform [3] offers one possible solution.

Second, BGP should be configured with a centralized, higher-level specification language. Today's BGP configuration languages enable an operator to specify router-level mechanisms that implement high-level policy, but the distributed, low-level nature of the configuration languages introduces complexity, obscurity, and opportunities for misconfiguration rather than design flexibility or expressiveness. For example, rcc detects many faults in implementation of some high-level policies in low-level configuration; these faults arise because there are many ways to implement the same high-level policy, and the low-level configuration is unintuitive. Ideally, a network operator would never touch low-level mechanisms (e.g., the community attribute) in the common case. Rather than configuring routers with a low-level language, an operator should configure the network using a language that directly reflects high-level policies.

References

[1] Anindya Basu et al. Route oscillations in IBGP with route reflection. In Proc. ACM SIGCOMM, Pittsburgh, PA, August 2002.

[2] Iljitsch Van Beijnum. BGP. O'Reilly and Associates, September 2002.

[3] Matthew Caesar, Nick Feamster, Jennifer Rexford, Aman Shaikh, and Kobus van der Merwe. Design and Implementation of a Routing Control Platform. In Proc. 2nd Symposium on Networked Systems Design and Implementation, Boston, MA, May 2005.

[4] Rohit Dube. A comparison of scaling techniques for BGP. ACM Computer Communications Review, 29(3):44-46, July 1999.

[5] Nick Feamster and Hari Balakrishnan. Towards a logic for wide-area Internet routing. In ACM SIGCOMM Workshop on Future Directions in Network Architecture, Karlsruhe, Germany, August 2003.

[6] Nick Feamster, Zhuoqing Morley Mao, and Jennifer Rexford. BorderGuard: Detecting cold potatoes from peers. In Proc. ACM SIGCOMM Internet Measurement Conference, Taormina, Sicily, Italy, October 2004.

[7] Nick Feamster, Jared Winick, and Jennifer Rexford. A model of BGP routing for network engineering. In Proc. ACM SIGMETRICS, New York, NY, June 2004.

[8] Lixin Gao, Timothy G. Griffin, and Jennifer Rexford. Inherently safe backup routing with BGP. In Proc. IEEE INFOCOM, Anchorage, AK, April 2001.

[9] Timothy Griffin and Gordon Wilfong. An analysis of BGP convergence properties. In Proc. ACM SIGCOMM, Cambridge, MA, September 1999.

[10] Timothy Griffin and Gordon Wilfong. On the correctness of IBGP configuration. In Proc. ACM SIGCOMM, Pittsburgh, PA, August 2002.

[11] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet Routing Convergence. IEEE/ACM Transactions on Networking, 9(3):293-306, June 2001.

[12] Ratul Mahajan, David Wetherall, and Tom Anderson. Understanding BGP misconfiguration. In Proc. ACM SIGCOMM, pages 3-17, Pittsburgh, PA, August 2002.

[13] William Norton. Internet service providers and peering. http://www.equinix.com/press/whtppr.htm.

[14] K. Varadhan, R. Govindan, and D. Estrin. Persistent route oscillations in inter-domain routing. Computer Networks, 32(1):1-16, 2000.

Footnote:

1. A full version of this paper appears in 2nd ACM/USENIX Symposium on Networked Systems Design and Implementation, May 2005.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)