MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

Testing Properties of Probability Distributions

Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld & Ning Xie

Given several rolls of a six-sided (or n-sided) die, can we determine if the die is fair? Given some medical data on a disease and a symptom, can we determine if the two are correlated?

Both of these problems are examples of probability distribution testing. The goal of distribution testing is to take samples drawn from a distribution, and determine whether or not the distribution has a specific property. The property may be uniformity, in the case of die rolls, independence, in the case of disease/symptom data, or any number of other properties. In general, we cannot determine with absolute certainty whether a distribution has a property (since an improbably bad sample may lead to an invalid conclusion). Therefore we seek efficient algorithms that with high probability distinguish distributions with the property from those which are far from having the property. The algorithms should be efficient both in the number of samples they require ("sample complexity"), and in the time taken to process those samples ("time complexity"). We also seek to show that our algorithms are optimal, by proving lower bounds on the sample and time complexity for such algorithms.

Previous Work

The most natural question to ask of a probability distribution is whether or not it is random, or uniform over its state space. The importance of randomness is paramount in computer science. Algorithms which are provided with a random sequences of bits are often faster and more elegant than their deterministic counterparts. Thus, we naturally want to distinguish distributions which are uniform from those which are not. This problem was most recently studied in a computer science context in [1], inspired by work in [2].

Current Work

Researchers have also shown that several randomized algorithms do not require uniformly random bits. Instead, these algorithms only require bits drawn from a k-wise independent distribution. The notion of k-wise independence has proven extremely useful for this purpose [3]. Thus, one of our goals is to develop an efficient algorithm to distinguish distributions which are k-wise independent from those which are not. Such an algorithm could be used to determine whether a distribution is "sufficiently random" to be used in this setting.

Techniques

We combine techiques from classical probability theory, combinatorics, and harmonic analysis to analyze our algorithms and lower bounds.

Research Support:

This research was supported in part by NSF grant 012702-001 and an NSF Graduate Fellowship.

References:

[1] Tugkam Batu and Lance Fortnow and Eldar Fischer and Ravi Kumar and Ronitt Rubinfeld and Patrick White. Testing Random Variables for Independence and Identity. IEEE Symposium on Foundations of Computer Science, pp. 442--451, 2001.

[2] Oded Goldreich and Dana Ron. On Testing Expansion in Bounded-Degree Graphs. Electronic Colloquium on Computational Complexity (ECCC), 020, 2000.

[2] J. Naor and M. Naor. Small-bias probability spaces: Efficient constructions and applications . SIAM Journal on Computing, 22(4):838-856, 1993. (preliminary version in STOC'90).

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu