CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

pDNS: Parallelizing DNS Lookups to Improve Performance

Ben Leong & Barbara Liskov

Introduction

The Domain Name System (DNS) is a critical component of the Internet. Although it was conceived some two decades ago, it functions relatively well today, which is truly remarkable considering that the Internet has grown by more than a hundredfold over the past decade.

Nevertheless, DNS has weaknesses. One weakness is lookup performance: Wills and Shang found that DNS lookup time contributes more than one second to 20% of web object retrievals based on NLANR proxy logs [1], Huitema et al. reported that 29% of DNS queries take more than 2 seconds [2], and Jung et al. reported that 10% of all queries take more than 2 seconds [3].

We propose a novel p2p system that improves performance of DNS lookups. Our approach attempts to addresses the 20% of queries that are not currently resolved in one hop. Although slow queries form only a small fraction of the total (10%), they are important because they are the most noticeable to end users; most of the remaining 90% of queries are already resolved within 300 ms [3].

Approach

Our idea is to use a p2p system composed of existing DNS name servers (we will refer to them as resolvers). The system nodes communicate with one another so that each of them contains (with high probability) complete information about all DNS authoritative name servers: each node maintains a cache in which it stores digests of NS-records. The extra cached information can then be used to perform the steps of a regular sequential DNS name resolution in parallel, resulting in resolution for these queries in the time it takes to do the slowest of the steps (instead of the sum). Because we use parallel lookups, we call our system pDNS.

Nodes in pDNS are enhanced DNS resolvers. Each node has its regular DNS cache and two additional tables. The first is a system membership table that stores information about every other node in the system, including its IP address. The second table is an NS-cache; it stores mappings from DNS domain names to the IP addresses of the authoritative servers for them. pDNS nodes cooperate to update their cached information. We use hashing and the p2p address space to partition responsibility for updates in a way that distributes the load uniformly.

Our lookup algorithm uses the NS-cache to help speed up query processing, but it does not rely on the answers it obtains in this way until they are checked out via a normal DNS lookup. This way we can guarantee the accuracy of our answers: our responses are exactly what would have been provided by unmodified DNS. The scheme is as follows:

If the A-record is found in DNS cache, return it.
If the NS-record of the authoritative subdomain is found in the DNS cache, request the A-record from it and return it.
Otherwise send an iterative DNS request to the authoritative server for the most complete subdomain name found in the DNS cache, and also send iterative DNS requests to all descendants of that subdomain with entries in the NS-cache. These queries are sent in parallel. The answer is returned as soon as we have obtained the A-record, and every step along the path to that name has been checked with DNS. Any new NS- and A-records obtained along the way are added to the DNS cache as usual.

The performance gain from our approach comes from doing the requests in parallel. In the normal case, we will have information in the NS-cache for every subdomain, and we can send queries to them in parallel with the DNS query. Furthermore, in the normal case the responses will check out, and therefore we can return the answer as soon as they all return. Thus the latency is determined by the slowest response, whereas using DNS alone we would have to do the requests in series and the latency would be the sum of the latencies for each request.

Nodes broadcast update information to other nodes using a distribution tree. The NS-cache contains this broadcast information. The key distinction between entries in the DNS cache and the NS-cache is that the former are trustworthy since they were verified during an earlier DNS lookup, while the same degree of trust cannot be ascribed to the latter: they have been obtained from other nodes in pDNS, and we do not trust them because they might be faulty.

In addition, we use a novel scheme involving committees to reduce vulnerability to attacks by limiting the rate at which faulty nodes can send bad information. Our approach to rate limiting is based on keys. Each node in pDNS has a public/private key pair. It chooses this pair itself before it joins (we do not rely on a public key infrastructure), and informs the other nodes of its public key when it joins the system. Nodes use their keys to sign messages; nodes drop messages with invalid signatures.

The use of signatures guarantees that messages come from particular members of the ring; they allow us to avoid problems such as IP spoofing and also ensure that a message being broadcast is not corrupted by an intermediate node. Signatures do not guarantee, however, that the information being sent via the broadcast is valid. To deal with this problem we require signatures from a committee. Every message to be broadcast must be signed by a committee consisting of the sender and k other nodes, where k is a system parameter. Committee members are chosen deterministically based on hashing, so that an attacker cannot easily control the membership of a committee.

A signer is responsible for checking message content: doing a DNS lookup to ensure that the IP addresses in the message are the right ones for that domain. The signature is over the entire message content. Messages must contain all the signatures and will be dropped if there are any discrepancies. This approach prevents garbage from being sent with very high probability.

Progress

We recently completed a preliminary study of the expected performance of our proposed parallel DNS lookup algorithm by analysing the DNS traces captured at the CSAIL gateway router from 19 to 24 February 2004. Our study suggests that our technique is able to improve DNS lookup performance for a range of queries, but it is not equally effective in improving all DNS queries. In particular, our scheme is most effective at improving the lookup latency for queries that currently take between 300 ms to 1,000 ms; for queries with longer latencies (more than 1 second), it seems that UDP packet drops at the last hop is the limiting factor and thus our technique is only marginally effective at improving the lookup latency.

We are currently studying our DNS trace data and working on modifying our basic scheme to make it more effective for queries with long latencies.

Research Support

This research was supported by the National Science Foundation under Grant No. ANI-0082503.

References

[1] Craig E. Wills and Hao Shang. The Contribution of DNS Lookup Costs to Web Object Retrieval. WPI Technical Report WPI-CS-TR-00-12, 2000.

[2] C. Huitema and S. Weerahandi. Internet Measurements: the Rising Tide and the DNS Snag. In Proceedings of ITC Specialist Seminar on Internet Traffic Measurement and Modeling, Monterey, CA, September 2000.

[3] Jaeyeon Jung, Emil Sit, Hari Balakrishnan and Robert Morris. DNS Performance and the Effectiveness of Caching. In Proceedings of the ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco, California, November 2001.

[4] Russ Cox, Athica Muthitacharoen and Robert Morris. Serving DNS using a Peer-to-Peer Lookup Service. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS '02), Cambridge, MA, March 2002.

[5] Venugopalan Ramasubramanian and Emin Gun Sirer. Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI 2004), San Francisco, California, March 2004.

[6] Venugopalan Ramasubramanian and Emin Gun Sirer. The Design and Implementation of a Next Generation Name Service for the Internet. In Proceedings of the 2004 ACM SIGCOMM Conference, Portland, Oregon, August 2004.

[7] KyoungSoo Park, Vivek S. Pai, Larry Peterson and Zhe Wang. CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups. In Proceedings of the Sixth Symposium on Operating Systems Design and Implementation(OSDI '04). November 2004.

[8] Jussi Kangasharju and Keith W. Ross. A Replicated Architecture for the Domain Name System. In Proceedings of IEEE INFOCOM '00. March 2000.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)