MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

Locally Decodable Codes and Private Information Retrieval Schemes

Sergey Yekhanin

Problem Overview

A line of research [BFLS,Sud,CGKS,KT] originating in early 1990s has led to discovery of two related beautiful notions that are of central importance to modern complexity theory and cryptography, namely that of Locally Decodable Codes (LDCs) and Private Information Retrieval schemes (PIRs).

LDCs [KT] are a special kind of error-correcting codes. Error-correcting codes are used to ensure reliable transmission of information over noisy channels as well as to ensure reliable storage information on a medium that may be partially corrupted over time (or whose reading device is subject to errors). In both of these applications the message is typically partitioned into small blocks and then each block is encoded separately. Such encoding strategy allows efficient random-access retrieval of the information, since one needs to decode only the portion of data one is interested in. Unfortunately, this strategy yields very poor noise resilience, since in case even a single block (out of possibly tens of thousands) is completely corrupted some information is lost. It would seem preferable to encode the whole message into a single codeword of an error-correcting code. Such solution clearly improves the robustness to noise, but is also hardly satisfactory, since one now needs to look at the whole codeword in order to recover any particular bit of the message (at least in the case when classical error-correcting codes are used). Such decoding complexity is prohibitive for modern massive data-sets.

Locally decodable codes are error-correcting codes that avoid the problem mentioned above by having extremely efficient sublinear-time decoding algorithms. More formally, a q-query locally decodable code C encodes n-bit messages x in such a way that one can probabilistically recover any bit x_i of the message by querying only q bits of the (possibly corrupted) codeword C(x), where q can be as small as 2.

Apart from the natural application to data transmission and storage [KT] locally decodable codes have found numerous applications within complexity theory in the context of worst-case to average-case reductions and probabilistically checkable proofs [T_survey]. The major goal of LDC related research is to establish the optimal trade-off between length and query complexity of such codes.

Private Information Retrieval schemes [CGKS] are cryptographic protocols developed in order to protect the privacy of the user accessing a public database. In such schemes a database (modelled by an n-bit string x) is replicated between q non-communicating servers. The user holds an index i and is interested in obtaining the value of the bit x_i. To achieve this goal, the user queries each of the servers and gets replies from which the desired bit x_i can be computed. The query to each server is distributed independently of i and therefore each server gets no information about what the user is after. Private information retrieval schemes are exploited for solving an array of specific tasks (such as secure keyword search, private approximations, private statistics, etc.). The main parameter of interest in a PIR scheme is its communication complexity, namely the number of bits exchanged by the user accessing an n-bit database and the servers.

Our contributions

In [Y_nicePir] we present a construction of locally decodable codes whose length is vastly shorter than that of previous constructions. Specifically, we show that every Mersenne prime p = 2^t - 1 yields a family of three query locally decodable codes of length Exp(n^{1/t}). The largest Mersenne prime known currently has t>10^7. Moreover a classical number-theoretic conjecture states that in fact the number of Mersenne primes is infinite. Using this assumption my constructions yield the first family of three query locally decodable codes of subexponential length. Analogous improvements follow for private information retrieval schemes. In particular, under the assumption mentioned above we obtain the first family of three server PIR schemes with subpolynomial communication. The results above were not expected by the community. After a decade of effort, many researchers in the area were pessimistic and believed that locally decodable codes with constant query complexity and subexponential length or private information retrieval schemes with constant number of servers and subpolynomial communication do not exist. In particular, such conjectures were published explicitly in [Go,Ga].

Our results in [Y_nicePir] yield tremendous improvements for private information retrieval schemes involving three or more servers, and provide no insights on the two server case. This raises a natural question regarding whether the two server case is truly intrinsically different. Our results in [RY] suggest that this may well be the case. We employ modular representation theory of finite groups to establish the optimality of the currently best known two server schemes [CGKS,BIKR,WY] in a restricted although fairly broad model.

In [WY] we present a lucid algebraic approach to private information retrieval revealing the clean interpolation idea that makes non-trivial PIRs possible. We also simplify some of the earlier results of [BIKR] and obtain improvements for various generalized settings of the problem.

The main bottleneck that prevents the currently best known PIR schemes from wide practical deployment is the overwhelming amount of computation the servers need to perform in order to response to users' queries. We plan to address this issue in our future research as well as to continue working on communication complexity of PIRs.

References:

[BFLS] L. Babai, L. Fortnow, L. Levin, and M. Szegedy, Checking computations in polylogarithmic time. In Proc. of the 23th ACM Sym. on Theory of Computing (STOC), pp. 21-31, 1991.

[BIKR] A. Beimel, Y. Ishai, E. Kushilevitz, and J. F. Raymond. Breaking the O(n^{1/(2k-1)}) Barrier for Information-Theoretic Private Information Retrieval, In Proc. of the 43rd IEEE Symposium on Foundations of Computer Science (FOCS), pp. 261-270, 2002.

[CGKS] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private information retrieval, In Proc. of the 36rd IEEE Symposium on Foundations of Computer Science (FOCS), pp. 41-50, 1995. Also, in Journal of the ACM, 45, 1998.

[Ga] W. Gasarch, A survey on private information retrieval, The Bulletin of the EATCS, 82:72-107, 2004.

[Go] O. Goldreich, Short Locally Testable Codes and Proofs, Technical Report TR05-014, Electronic Colloquium on Computational Complexity (ECCC), 2005.

[KT] J. Katz and L. Trevisan, On the efficiency of local decoding procedures for error-correcting codes, In Proc. of the 32th ACM Sym. on Theory of Computing (STOC), pp. 80-86, 2000.

[RY] A. Razborov and S. Yekhanin An Omega(n^{1/3}) Lower Bound for Bilinear Group Based Private Information Retrieval, In Proc. of the 47rd IEEE Symposium on Foundations of Computer Science (FOCS), 2006.

[Sud] M. Sudan, Efficient Checking of Polynomials and Proofs and the Hardness of Approximation Problems. PhD thesis, University of California at Berkeley, 1992.

[T_survey] L. Trevisan, Some Applications of Coding Theory in Computational Complexity, Quaderni di Matematica, 13:347-424, 2004.

[WY] D. Woodruff and S. Yekhanin, A Geometric Approach to Information Theoretic Private Information Retrieval, In Proc. of the 20th IEEE Computational Complexity Conference (CCC), pp. 275-284, 2005.

[Y_nicePir] S. Yekhanin New Locally Decodable Codes and Private Information Retrieval Schemes, Electronic Colloquium on Computational Complexity (ECCC), TR06-127.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu