CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

List Decoding

Piotr Indyk & Madhu Sudan

What

One of the fundamental problems in dealing with information is errors. Over the course of time, errors inevitably corrupt information during its storage, communication, and processing. The challenge of dealing with errors is a fundamental one. It has been studied for over fifty years now, starting with the work of Shannon. We know a lot about dealing with errors now, but there is much more to learn.

How

To deal with the problem of errors, one encodes information using "error-correcting codes". Error-correcting codes are combinatorial designs constructed to deal with the problem of noise in information transmission. A code describes how to judiciously add redundancy information so that if a small number of errors occur during transmission, then the errors can be discovered and corrected. The resiliency of a code can be measured by its minimum distance, which is the number d such that any two encodings of two different messages (i.e., codewords) differ in at least d positions. In the seminal paper that introduced error-correcting codes, Hamming showed a code is capable of correcting d/2 errors and uniquely recovering the encoded message if and only if its distance is at least d. Subsequently the study of error-correcting codes has led to many codes where these tasks are achievable algorithmically.

What happens when if we try to recover from more than d/2 errors when the code has distance d ? Hamming's result suggests that this task is ill-posed: if we take two codewords that are distance d apart, and corrupt one of them by d/2 errors in the direction of the other, the receiver cannot uniquely recover the "right" codeword. From this perspective, if a decoding algorithm is given a codewords corrupted by more than d/2 errors, it can simply fail.

There is, however, a way to overcome this obstacle: one can relax the requirement that the decoding always outputs a unique answer. Such a relaxation is captured by the notion of list decoding. Under list decoding, the decoder is required to reports a (short) list of all codewords that match the transmitted word closely. This circumvents the above "half-of-the-minimum-distance" barrier, since in case of a tie, the decoder can output both answers, instead of simply failing. Thus, a meaningful transmission of information can be performed even in highly-noisy settings. Even better, it turns out that the error patterns which cannot be unambiguously decoded occur quite rarely in practice. Therefore, in most cases, the recovered list of codewords will contain only the encoding of the original message, even if the number of errors approaches d.

Progress

Even though the notion of list-decoding has been studied since the 50's, the first polynomial time algorithms that perform the list-decoding tasks (for Reed-Solomon codes) have been discovered only recently [1,2]. Since then, list-decoding has been a subject of extensive studies, both in theory [3,4,5] and in practice.

In this project we investigate efficient list-decoding schemes, from the perspective of both (1) the amount of error they can correct, and (2) the computational efficiency of the decoding algorithm. Our explorations cover two broad classes of encoding schemes: (1) Algebraic error-correcting codes: These codes are based on the properties of (multivariate) polynomials over finite fields, and (2) Graph-theoretic codes: These codes are more combinatorial in their structure and constructed from families of graphs with "pseudo-random" properties. The former class has typically been the first to suggest possibilities of correcting large number of errors for a given choice of other parameters. The latter class has typically been more effective in correcting errors, especially in the sense of the running time of the decoding algorithm.

Our main results in the domain of algebra [1,2] show how to take codes of rate R (i.e., mapping Rn symbols into n symbols) and correct upto 1 -√R fraction of errors in polynomial time.

In [3], we show how to design codes which, for any noise level, enable encoding and list-decoding in time which is linear in the length of the encoded message. The rate of our scheme is quite low: it is inversely exponential in the fraction of codeword positions which are not corrupted by errors. Nevertheless, it shows that even in highly noisy settings, information transmission can be done in an asymptotically optimal manner. Moreover, unlike earlier schemes, our code construction does not use polynomials. Instead, we use certain families of graphs with very good expansion properties.

The high redundancy in the above scheme does not seems inevitable, and it is quite likely that further research will reduce it to a more practical amount. As a step in this direction, we show [5] that for simpler communication channels, the rate can be reduced to polynomial or even quadratic.

Finally, in [4], we point out an interesting connection between the unique and list decoding approaches. Specifically, we show that it is possible to design binary codes such that (a) they can be constructed in polynomial time using a randomized procedure (b) their rate matches the best known non-constructive bound of Gilbert and Varshamov, and (c) they are equipped with a polynomial time encoding and unique decoding procedures. We construct them by taking codes for which efficient list-decoders exist, and combining them with (short) random codes. The resulting codes, even though not constructed "explicitely", are nevertheless "functionally explicit", in the sense that their construction, encoding and unique decoding can all be done in randomized polynomial time.

Future

Many basic questions about error-correction remain open when one examines them in the algorithmic context, including the task of designing codes with optimal relationship between their redundancy and their error-correcting capability, with linear-time (or even polynomial time) decoding algorithms.

Research Support

This research is supported by NSF Award CCR 0219218 and a Packard Foundation Fellowhship. Much of this project is being carried out in collaboration with Prof. Venkatesan Guruswami (Ph.D. MIT, 2001) now at the University of Washington.

References

[1] Sudan, M., " Decoding of Reed Solomon codes beyond the error-correction bound", Journal of Complexity, 13(1), 180-193, March 1997.

[2] Guruswami, V. and Sudan, M., "Improved decoding of Reed-Solomon codes and algebraic-geometry codes", IEEE Transactions on Information Theory, 45(6): 1757--1767, September 1999.

[3] Guruswami, V., and P. Indyk, "Linear time encodable and list decodable codes", Proceedings of the 35th ACM Symposium on Theory of Computing, San Diego, CA, June 2003, pp. 126-135.

[4] Guruswami, V., and P. Indyk, "Efficiently Decodable Low-Rate Codes Meeting Gilbert-Varshamov Bound", Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LO, January 2004, pp. 756-757.

[5] Guruswami, V., and P. Indyk, "Linear time list decoding in error-free settings", Proceedings of the 31st International Colloquium on Automata, Languages and Programming, Turku, Finland, July 2004, pp. 695-707.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)