Abstracts - 2006
Towards Efficient Algorithms for General Active Learning
Claire Monteleoni & Tommi Jaakkola
In many machine learning applications, such as speech recognition, medical diagnosis and webpage classification, access to labeled data is much more limited or expensive than access to unlabeled samples from the same data-generating distribution. It is often realistic to model this scenario as active learning. Since active learning allows for intelligent choices of which examples to label, often the label-complexity, the number of labeled examples required to learn a concept via active learning, is significantly lower than the PAC sample complexity. We focus here on an active learning framework called selective sampling, in which the learner receives unlabeled data and may request certain labels to be revealed.
The strongest positive results involving efficient algorithms for selective sampling, are an analysis by Freund et al.  of the Query By Committee algorithm , and recent work by Dasgupta, Kalai and Monteleoni  yielding the same label-complexity in a fully online setting, when learning linear separators, as well as a bound on total errors (labeled and unlabeled). To date, the analyses of both of these algorithms have been performed only with respect to input distributions that are uniform or near-uniform. Dasgupta  recently provided a positive result for selective sampling to learn arbitrary concepts under arbitrary input distributions. However the upper bound on labels was achieved by an intractable algorithm: exponential storage and computation were required, as well as the assumption of access to an exponential number of actual functions from the concept class (not just their predictions).
Approach and ongoing work
We consider a selective sampling framework in which unlabeled samples are drawn independently from an arbitrary but fixed distribution over the input space, referred to as the ``input distribution''. We analyze active learning in the realizable case (a perfect classifier exists in the concept class). Labels agreeing with the target classifier are available via a noiseless oracle at a constant cost per example, and the objective is to achieve a target generalization error with minimum label-complexity. We operate in the non-Bayesian setting: no prior probability distribution is assumed over the concept class.
The version space is the set of hypotheses still consistent with all labeled examples seen. The diameter of the version space with respect to the disagreement probability of hypothesis pairs on the input distribution (which is a pseudo-metric) is an upper bound on the generalization error of the algorithm: the latter is simply the distance (under the pseudo-metric) between the algorithm's hypothesis and the target, and both are assumed to be within the version space. Thus, minimizing the diameter of the version space has been proposed as a natural candidate for the basis of an active learning algorithm [6,3]. However, without assuming the uniform input distribution as in [5,4], nor access to an explicit covering of hypotheses in the class (which requires exponential storage) as in , nor a prior over hypotheses as in , it is inconvenient to operate in the version space directly, as it would require mapping the input distribution to some sort of measure over the version space. In recent and ongoing joint work  with Kääriäinen, we suggest an approach operating entirely on the input space, as opposed to maintaining the version space at all. This technique operates on the ``region of uncertainty,'' a notion first proposed by Cohn, Atlas and Ladner . One can maintain the uncertainty region exactly for convex concept classes, and in general by reduction to supervised batch learning. It is ongoing work to formalize the implicit diameter reduction of the version space implemented by updates to the uncertainty region.
Recently however, we are considering techniques beyond diameter reduction of the version space. We have shown that  can yield a hypothesis that need not be in the version space, but is within the desired error from the target. Since this algorithm is optimal for learning homogeneous linear separators, at least under the uniform input distribution, there is reason to consider techniques whose objective is the "target region," i.e. the set of hypothesis whose error rate with respect to the target is in the desired range, as opposed to being constrained to the version space. More generally, we have shown that after any finite amount of labeled examples, the current version space need not be contained in the target region, and vice-versa. In future work we hope to exploit this insight to design efficient algorithms for general active learning.
This work is supported in part by a grant from British Aerospace Engineering.
 David A. Cohn, Les Atlas and Richard E. Ladner. Improving Generalization with Active Learning. In Machine Learning, 15(2) pp. 201--221, 1994.
 Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In Advances in Neural Information Processing Systems 17. Vancouver, Canada, December 2004.
 Sanjoy Dasgupta. Coarse sample complexity bounds for active learning. In Advances in Neural Information Processing Systems 18. Vancouver, Canada, December 2005.
 Sanjoy Dasgupta, Adam Tauman Kalai and Claire Monteleoni. Analysis of perceptron-based active learning. In The 18th Annual Conference on Learning Theory, Bertinoro, Italy, June 2005.
 Yoav Freund, H. Sebastian Seung, Eli Shamir, Naftali Tishby. Selective Sampling Using the Query by Committee Algorithm. In Machine Learning, 28 (2-3):133--168, August/September, 1997.
 Matti Kääriäinen . Generalization Error Bounds Using Unlabeled Data. In The 18th Annual Conference on Learning Theory, Bertinoro, Italy, June 2005.
 Claire Monteleoni and Matti Kääriäinen. Active Learning under Arbitrary Distributions. In preparation; preliminary work presented at Workshop on Value of Information in Inference, Learning and Decision-Making, at Neural Information Processing Systems Conference. Whistler, Canada, December 2005.
 H. S. Seung, Manfred Opper and Haim Sompolinsky. Query by Committee. In Proc. Fifth Annual ACM Conference on Computational Learning Theory, 1992.