CSAIL Digital Archive - Artificial Intelligence
Laboratory Series
AIM-2005-037 Author[s]: Charles C. Kemp and Aaron Edsinger Visual Tool Tip Detection and Position Estimation for Robotic Manipulation of Unknown Human Tools November 16, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-037.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-037.pdf Robots that use human tools could more easily work with people, perform tasks that are important to people, and benefit from human strategies for accomplishing these tasks. For a wide variety of tools and tasks, control of the tool's endpoint is sufficient for its use. In this paper we present a straight-forward method for rapidly detecting the endpoint of an unmodeled tool and estimating its position with respect to the robot's hand. The robot rotates the tool while using optical flow to detect the most rapidly moving image points, and then finds the 3D position with respect to its hand that best explains these noisy 2D detections. The resulting 3D position estimate allows the robot to control the position of the tool endpoint and predict its visual location. We show successful results for this method using a humanoid robot with a variety of traditional tools, including a pen, a hammer, and pliers, as well as more general tools such as a bottle and the robot's own finger.
AIM-2005-036 CBCL-259 Author[s]: T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, T. Poggio A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex December 19, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-036.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-036.pdf We describe a quantitative theory to account for the computations performed by the feedforward path of the ventral stream of visual cortex and the local circuits implementing them. We show that a model instantiating the theory is capable of performing recognition on datasets of complex images at the level of human observers in rapid categorization tasks. We also show that the theory is consistent with (and in some case has predicted) several properties of neurons in V1, V4, IT and PFC. The theory seems sufficiently comprehensive, detailed and satisfactory to represent an interesting challenge for physiologists and modelers: either disprove its basic features or propose alternative theories of equivalent scope. The theory suggests a number of open questions for visual physiology and psychophysics.
AIM-2005-035 CBCL-258 Author[s]: Yuri Ivanov, Thomas Serre and Jacob Bouvrie Confidence weighted classifier combination for multi-modal human identification December 14, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-035.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-035.pdf In this paper we describe a technique of classifier combination used in a human identification system. The system integrates all available features from multi-modal sources within a Bayesian framework. The framework allows representing a class of popular classifier combination rules and methods within a single formalism. It relies on a “per-class” measure of confidence derived from performance of each classifier on training data that is shown to improve performance on a synthetic data set. The method is especially relevant in autonomous surveillance setting where varying time scales and missing features are a common occurrence. We show an application of this technique to the real-world surveillance database of video and audio recordings of people collected over several weeks in the office setting.
AIM-2005-034 Author[s]: Leonid Taycher, Gregory Shakhnarovich, David Demirdjian, and Trevor Darrell Conditional Random People: Tracking Humans with CRFs and Grid Filters December 1, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-034.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-034.pdf We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are \emph{learned} from data. We find functions that embed both state and observation into a space where similarity corresponds to $L_1$ distance, and define an observation potential based on distance in this space. This potential is extremely fast to compute and in conjunction with a grid-filtering framework can be used to reduce a continuous state estimation problem to a discrete one. We show how a state temporal prior in the grid-filter can be computed in a manner similar to a sparse HMM, resulting in real-time system performance. The resulting system is used for human pose tracking in video sequences.
AIM-2005-033 Author[s]: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni Analysis of Perceptron-Based Active Learning November 17, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-033.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-033.pdf We start by showing that in an active learning setting, the Perceptron algorithm needs $\Omega(\frac{1}{\epsilon^2})$ labels to learn linear separators within generalization error $\epsilon$. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error $\epsilon$ after asking for just $\tilde{O}(d \log \frac{1}{\epsilon})$ labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex query-by-committee algorithm.
AIM-2005-032 Author[s]: Claire Monteleoni, Tommi Jaakkola Online Learning of Non-stationary Sequences November 17, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-032.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-032.pdf We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization of the switching-rate parameter that governs the switching dynamics. We demonstrate the algorithm in the context of wireless networks.
AIM-2005-031 Author[s]: Alexandr Andoni and Piotr Indyk New LSH-based Algorithm for Approximate Nearest Neighbor November 3, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-031.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-031.pdf We present an algorithm for c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn^{1/c^2+o(1)}) and space O(dn + n^{1+1/c^2+o(1)}).
AIM-2005-030 CBCL-257 Author[s]: Ross Lippert and Ryan Rifkin Asymptotics of Gaussian Regularized Least-Squares October 20, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-030.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-030.pdf We consider regularized least-squares (RLS) with a Gaussian kernel. We prove that if we let the Gaussian bandwidth $\sigma \rightarrow \infty$ while letting the regularization parameter $\lambda \rightarrow 0$, the RLS solution tends to a polynomial whose order is controlled by the relative rates of decay of $\frac{1}{\sigma^2}$ and $\lambda$: if $\lambda = \sigma^{-(2k+1)}$, then, as $\sigma \rightarrow \infty$, the RLS solution tends to the $k$th order polynomial with minimal empirical error. We illustrate the result with an example.
AIM-2005-029 CBCL-256 Author[s]: Gadi Geiger & Domenic G Amara Towards the Prevention of Dyslexia October 18, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-029.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-029.pdf Previous studies have shown that dyslexic individuals who supplement windowed reading practice with intensive small-scale hand-eye coordination tasks exhibit marked improvement in their reading skills. Here we examine whether similar hand-eye coordination activities, in the form of artwork performed by children in kindergarten, first and second grades, could reduce the number of students at-risk for reading problems. Our results suggest that daily hand-eye coordination activities significantly reduce the number of students at-risk. We believe that the effectiveness of these activities derives from their ability to prepare the students perceptually for reading.
AIM-2005-028 CBCL-255 Author[s]: Sanmay Das Learning to Trade with Insider Information October 7, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-028.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-028.pdf This paper introduces algorithms for learning how to trade using insider (superior) information in Kyle's model of financial markets. Prior results in finance theory relied on the insider having perfect knowledge of the structure and parameters of the market. I show here that it is possible to learn the equilibrium trading strategy when its form is known even without knowledge of the parameters governing trading in the model. However, the rate of convergence to equilibrium is slow, and an approximate algorithm that does not converge to the equilibrium strategy achieves better utility when the horizon is limited. I analyze this approximate algorithm from the perspective of reinforcement learning and discuss the importance of domain knowledge in designing a successful learning algorithm.
AIM-2005-027 Author[s]: Georgios Theocharous, Sridhar Mahadevan, Leslie Pack Kaelbling Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation September 27, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-027.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-027.pdf Partially observable Markov decision processes (POMDPs) are a well studied paradigm for programming autonomous robots, where the robot sequentially chooses actions to achieve long term goals efficiently. Unfortunately, for real world robots and other similar domains, the uncertain outcomes of the actions and the fact that the true world state may not be completely observable make learning of models of the world extremely difficult, and using them algorithmically infeasible. In this paper we show that learning POMDP models and planning with them can become significantly easier when we incorporate into our algorithms the notions of spatial and tempral abstraction. We demonstrate the superiority of our algorithms by comparing them with previous flat approaches for large scale robot navigation.
AIM-2005-026 Author[s]: Chris Stauffer Automated Audio-visual Activity Analysis September 20, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-026.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-026.pdf Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.
AIM-2005-025 Author[s]: Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman LabelMe: a database and web-based tool for image annotation September 8, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-025.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-025.pdf Research in object detection and recognition in cluttered scenes requires large image collections with ground truth labels. The labels should provide information about the object classes present in each image, as well as their shape and locations, and possibly other attributes such as pose. Such data is useful for testing, as well as for supervised learning. This project provides a web-based annotation tool that makes it easy to annotate images, and to instantly share such annotations with the community. This tool, plus an initial set of 10,000 images (3000 of which have been labeled), can be found at http://www.csail.mit.edu/$\sim$brussell/research/LabelMe/intro.html
AIM-2005-024 Author[s]: Whitman Richards Collective Choice with Uncertain Domain Moldels August 16, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-024.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-024.pdf When groups of individuals make choices among several alternatives, the most compelling social outcome is the Condorcet winner, namely the alternative beating all others in a pair-wise contest. Obviously the Condorcet winner cannot be overturned if one sub-group proposes another alternative it happens to favor. However, in some cases, and especially with haphazard voting, there will be no clear unique winner, with the outcome consisting of a triple of pair-wise winners that each beat different subsets of the alternatives (i.e. a “top-cycle”.) We explore the sensitivity of Condorcet winners to various perturbations in the voting process that lead to top-cycles. Surprisingly, variations in the number of votes for each alternative is much less important than consistency in a voter’s view of how alternatives are related. As more and more voter’s preference orderings on alternatives depart from a shared model of the domain, then unique Condorcet outcomes become increasingly unlikely.
AIM-2005-023 CBCL-254 Author[s]: Jerry Jun Yokono and Tomaso Poggio Boosting a Biologically Inspired Local Descriptor for Geometry-free Face and Full Multi-view 3D Object Recognition July 7, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-023.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-023.pdf Object recognition systems relying on local descriptors are increasingly used because of their perceived robustness with respect to occlusions and to global geometrical deformations. Descriptors of this type -- based on a set of oriented Gaussian derivative filters -- are used in our recognition system. In this paper, we explore a multi-view 3D object recognition system that does not use explicit geometrical information. The basic idea is to find discriminant features to describe an object across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on face images with excellent recognition rate.
AIM-2005-022 CBCL-253 Author[s]: Chou Hung, Gabriel Kreiman, Tomaso Poggio, James J. DiCarlo Ultra-fast Object Recognition from Few Spikes July 6, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-022.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-022.pdf Understanding the complex brain computations leading to object recognition requires quantitatively characterizing the information represented in inferior temporal cortex (IT), the highest stage of the primate visual stream. A read-out technique based on a trainable classifier is used to characterize the neural coding of selectivity and invariance at the population level. The activity of very small populations of independently recorded IT neurons (~100 randomly selected cells) over very short time intervals (as small as 12.5 ms) contains surprisingly accurate and robust information about both object ‘identity’ and ‘category’, which is furthermore highly invariant to object position and scale. Significantly, selectivity and invariance are present even for novel objects, indicating that these properties arise from the intrinsic circuitry and do not require object-specific learning. Within the limits of the technique, there is no detectable difference in the latency or temporal resolution of the IT information supporting so-called ‘categorization’ (a.k. basic level) and ‘identification’ (a.k. subordinate level) tasks. Furthermore, where information, in particular information about stimulus location and scale, can also be read-out from the same small population of IT neurons. These results show how it is possible to decode invariant object information rapidly, accurately and robustly from a small population in IT and provide insights into the nature of the neural code for different kinds of object-related information.
AIM-2005-021 Author[s]: ali rahimi, ben recht, trevor darrell Nonlinear Latent Variable Models for Video Sequences June 6, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-021.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-021.pdf Many high-dimensional time-varying signals can be modeled as a sequence of noisy nonlinear observations of a low-dimensional dynamical process. Given high-dimensional observations and a distribution describing the dynamical process, we present a computationally inexpensive approximate algorithm for estimating the inverse of this mapping. Once this mapping is learned, we can invert it to construct a generative model for the signals. Our algorithm can be thought of as learning a manifold of images by taking into account the dynamics underlying the low-dimensional representation of these images. It also serves as a nonlinear system identification procedure that estimates the inverse of the observation function in nonlinear dynamic system. Our algorithm reduces to a generalized eigenvalue problem, so it does not suffer from the computational or local minimum issues traditionally associated with nonlinear system identification, allowing us to apply it to the problem of learning generative models for video sequences.
AIM-2005-020 Author[s]: Florent Segonne, Jean-Philippe Pons, Bruce Fischl, and Eric Grimson A Novel Active Contour Framework. Multi-component Level Set Evolution under Topology Control June 1, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-020.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-020.pdf We present a novel framework to exert a topology control over a level set evolution. Level set methods offer several advantages over parametric active contours, in particular automated topological changes. In some applications, where some a priori knowledge of the target topology is available, topological changes may not be desirable. A method, based on the concept of simple point borrowed from digital topology, was recently proposed to achieve a strict topology preservation during a level set evolution. However, topologically constrained evolutions often generate topological barriers that lead to large geometric inconsistencies. We introduce a topologically controlled level set framework that greatly alleviates this problem. Unlike existing work, our method allows connected components to merge, split or vanish under some specific conditions that ensure that no topological defects are generated. We demonstrate the strength of our method on a wide range of numerical experiments.
AIM-2005-019 CBCL-252 Author[s]: Andrea Caponnetto, Lorenzo Rosasco, Ernesto De Vito and Alessandro Verri Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm May 27, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-019.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-019.pdf This paper presents an approach to model selection for regularized least-squares on reproducing kernel Hilbert spaces in the semi-supervised setting. The role of effective dimension was recently shown to be crucial in the definition of a rule for the choice of the regularization parameter, attaining asymptotic optimal performances in a minimax sense. The main goal of the present paper is showing how the effective dimension can be replaced by an empirical counterpart while conserving optimality. The empirical effective dimension can be computed from independent unlabelled samples. This makes the approach particularly appealing in the semi-supervised setting.
AIM-2005-018 CBCL-250 Author[s]: Andrea Caponnetto and Alexander Rakhlin Some Properties of Empirical Risk Minimization over Donsker Classes May 17, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-018.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-018.pdf We study properties of algorithms which minimize (or almost minimize) empirical error over a Donsker class of functions. We show that the L2-diameter of the set of almost-minimizers is converging to zero in probability. Therefore, as the number of samples grows, it is becoming unlikely that adding a point (or a number of points) to the training set will result in a large jump (in L2 distance) to a new hypothesis. We also show that under some conditions the expected errors of the almost-minimizers are becoming close with a rate faster than n^{-1/2}.
AIM-2005-017 Author[s]: Thade Nahnsen, Ozlem Uzuner, Boris Katz Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection May 19, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-017.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-017.pdf We present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation.
AIM-2005-016 Author[s]: Christopher Taylor, Ali Rahimi, Jonathan Bachrach and Howard Shrobe Simultaneous Localization, Calibration, and Tracking in an ad Hoc Sensor Network April 26, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-016.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-016.pdf We introduce Simultaneous Localization and Tracking (SLAT), the problem of tracking a target in a sensor network while simultaneously localizing and calibrating the nodes of the network. Our proposed solution, LaSLAT, is a Bayesian filter providing on-line probabilistic estimates of sensor locations and target tracks. It does not require globally accessible beacon signals or accurate ranging between the nodes. When applied to a network of 27 sensor nodes, our algorithm can localize the nodes to within one or two centimeters.
AIM-2005-015 CBCL-249 Author[s]: Ernesto De Vito and Andrea Caponnetto Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels May 16, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-015.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-015.pdf We show that recent results in [3] on risk bounds for regularized least-squares on reproducing kernel Hilbert spaces can be straightforwardly extended to the vector-valued regression setting. We first briefly introduce central concepts on operator-valued kernels. Then we show how risk bounds can be expressed in terms of a generalization of effective dimension.
AIM-2005-014 Author[s]: Jacob Eisenstein and Randall Davis Gestural Cues for Sentence Segmentation April 19, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-014.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-014.pdf In human-human dialogues, face-to-face meetings are often preferred over phone conversations. One explanation is that non-verbal modalities such as gesture provide additional information, making communication more efficient and accurate. If so, computer processing of natural language could improve by attending to non-verbal modalities as well. We consider the problem of sentence segmentation, using hand-annotated gesture features to improve recognition. We find that gesture features correlate well with sentence boundaries, but that these features improve the overall performance of a language-only system only marginally. This finding is in line with previous research on this topic. We provide a regression analysis, revealing that for sentence boundary detection, the gestural features are largely redundant with the language model and pause features. This suggests that gestural features can still be useful when speech recognition is inaccurate.
AIM-2005-013 CBCL-248 Author[s]: Andrea Caponnetto and Ernesto De Vito Fast Rates for Regularized Least-squares Algorithm April 14, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-013.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-013.pdf We develop a theoretical analysis of generalization performances of regularized least-squares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. In fact, a minimax analysis is performed which shows asymptotic optimality of the above-mentioned criterion.
AIM-2005-012 Author[s]: Jacob Beal Learning From Snapshot Examples April 13, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-012.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-012.pdf Examples are a powerful tool for teaching both humans and computers. In order to learn from examples, however, a student must first extract the examples from its stream of perception. Snapshot learning is a general approach to this problem, in which relevant samples of perception are used as examples. Learning from these examples can in turn improve the judgement of the snapshot mechanism, improving the quality of future examples. One way to implement snapshot learning is the Top-Cliff heuristic, which identifies relevant samples using a generalized notion of peaks. I apply snapshot learning with the Top-Cliff heuristic to solve a distributed learning problem and show that the resulting system learns rapidly and robustly, and can hallucinate useful examples in a perceptual stream from a teacherless system.
AIM-2005-011 Author[s]: Justin Werfel, Yaneer Bar-Yam, Radhika Nagpal Construction by robot swarms using extended stigmergy April 8, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-011.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-011.pdf We describe a system in which simple, identical, autonomous robots assemble two-dimensional structures out of identical building blocks. We show that, in a system divided in this way into mobile units and structural units, giving the blocks limited communication abilities enables robots to have sufficient global structural knowledge to rapidly build elaborate pre-designed structures. In this way we extend the principle of stigmergy (storing information in the environment) used by social insects, by increasing the capabilities of the blocks that represent that environmental information. As a result, arbitrary solid structures can be built using a few fixed, local behaviors, without requiring construction to be planned out in detail.
AIM-2005-010 Author[s]: Kilian M. Pohl, John Fisher, W. Eric L. Grimson, William M. Wells An Expectation Maximization Approach for Integrated Registration, Segmentation, and Intensity Correction April 1, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-010.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-010.pdf This paper presents a statistical framework which combines the registration of an atlas with the segmentation of MR images. We use an Expectation Maximization-based algorithm to find a solution within the model, which simultaneously estimates image inhomogeneities, anatomical labelmap, and a mapping from the atlas to the image space. An example of the approach is given for a brain structure-dependent affine mapping approach. The algorithm produces high quality segmentations for brain tissues as well as their substructures. We demonstrate the approach on a set of 30 brain MR images. In addition, we show that the approach performs better than similar methods which separate the registration from the segmentation problem.
AIM-2005-009 CBCL-247 Author[s]: Lior Wolf & Stanley Bileschi Combining Variable Selection with Dimensionality Reduction March 30, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-009.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-009.pdf This paper bridges the gap between variable selection methods (e.g., Pearson coefficients, KS test) and dimensionality reduction algorithms (e.g., PCA, LDA). Variable selection algorithms encounter difficulties dealing with highly correlated data, since many features are similar in quality. Dimensionality reduction algorithms tend to combine all variables and cannot select a subset of significant variables. Our approach combines both methodologies by applying variable selection followed by dimensionality reduction. This combination makes sense only when using the same utility function in both stages, which we do. The resulting algorithm benefits from complex features as variable selection algorithms do, and at the same time enjoys the benefits of dimensionality reduction.1
AIM-2005-008 Author[s]: Leonid Taycher, John W. Fisher III, and Trevor Darrell Combining Object and Feature Dynamics in Probabilistic Tracking March 2, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-008.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-008.pdf Objects can exhibit different dynamics at different scales, a property that is often exploited by visual tracking algorithms. A local dynamic model is typically used to extract image features that are then used as inputs to a system for tracking the entire object using a global dynamic model. Approximate local dynamics may be brittle---point trackers drift due to image noise and adaptive background models adapt to foreground objects that become stationary---but constraints from the global model can make them more robust. We propose a probabilistic framework for incorporating global dynamics knowledge into the local feature extraction processes. A global tracking algorithm can be formulated as a generative model and used to predict feature values that influence the observation process of the feature extractor. We combine such models in a multichain graphical model framework. We show the utility of our framework for improving feature tracking and thus shape and motion estimates in a batch factorization algorithm. We also propose an approximate filtering algorithm appropriate for online applications, and demonstrate its application to problems such as background subtraction, structure from motion and articulated body tracking.
AIM-2005-007 Author[s]: Kristen Grauman and Trevor Darrell Pyramid Match Kernels: Discriminative Classification with Sets of Image Features March 17, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-007.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-007.pdf Discriminative learning is challenging when examples are sets of local image features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernel-based classification methods can learn complex decision boundaries, but a kernel similarity measure for unordered set inputs must somehow solve for correspondences -- generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in this space. This ``pyramid match" computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positive-definite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be dramatically faster than current approaches.
AIM-2005-006 CBCL-246 Author[s]: Benjamin Balas, Pawan Sinha Receptive field structures for recognition March 1, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-006.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-006.pdf Localized operators, like Gabor wavelets and difference-of-Gaussian filters, are considered to be useful tools for image representation. This is due to their ability to form a ‘sparse code’ that can serve as a basis set for high-fidelity reconstruction of natural images. However, for many visual tasks, the more appropriate criterion of representational efficacy is ‘recognition’, rather than ‘reconstruction’. It is unclear whether simple local features provide the stability necessary to subserve robust recognition of complex objects. In this paper, we search the space of two-lobed differential operators for those that constitute a good representational code under recognition/discrimination criteria. We find that a novel operator, which we call the ‘dissociated dipole’ displays useful properties in this regard. We describe simple computational experiments to assess the merits of such dipoles relative to the more traditional local operators. The results suggest that non-local operators constitute a vocabulary that is stable across a range of image transformations.
AIM-2005-005 Author[s]: Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman Discovering object categories in image collections February 25, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-005.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-005.pdf Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysis these are used to discover topics in a corpus using the bag-of-words document representation. Here we discover topics as object categories, so that an image containing instances of several categories is modelled as a mixture of topics. The models are applied to images by using a visual analogue of a word, formed by vector quantizing SIFT like region descriptors. We investigate a set of increasingly demanding scenarios, starting with image sets containing only two object categories through to sets containing multiple categories (including airplanes, cars, faces, motorbikes, spotted cats) and background clutter. The object categories sample both intra-class and scale variation, and both the categories and their approximate spatial layout are found without supervision. We also demonstrate classification of unseen images and images containing multiple objects. Performance of the proposed unsupervised method is compared to the semi-supervised approach of Fergus et al.
AITR-2005-004 Author[s]: Ozlem Uzuner Identifying Expression Fingerprints using Linguistic Information November 16, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-004.ps ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-004.pdf This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. The approach presented facilitates identification of intellectual property using expression fingerprints. Copyright law protects expression of content. Recognizing literary works for copyright protection requires identification of the expression of their content. The expression fingerprints described in this thesis use a novel set of linguistic features that capture both the content presented in documents and the manner of expression used in conveying this content. These fingerprints consist of both syntactic and semantic elements of language. Examples of the syntactic elements of expression include structures of embedding and embedded verb phrases. The semantic elements of expression consist of high-level, broad semantic categories. Syntactic and semantic elements of expression enable generation of models that correctly identify books and their paraphrases 82% of the time, providing a significant (approximately 18%) improvement over models that use tfidf-weighted keywords. The performance of models built with these features is also better than models created with standard features used in stylometry (e.g., function words), which yield an accuracy of 62%. In the non-digital world, copyright holders collect revenues by controlling distribution of their works. Current approaches to the digital copyright problem attempt to provide copyright holders with the same kind of control over distribution by employing Digital Rights Management (DRM) systems. However, DRM systems also enable copyright holders to control and limit fair use, to inhibit others' speech, and to collect private information about individual users of digital works. Digital tracking technologies enable alternate solutions to the digital copyright problem; some of these solutions can protect creative incentives of copyright holders in the absence of control over distribution of works. Expression fingerprints facilitate digital tracking even when literary works are DRM- and watermark-free, and even when they are paraphrased. As such, they enable metering popularity of works and make practicable solutions that encourage large-scale dissemination and unrestricted use of digital works and that protect the revenues of copyright holders, for example through taxation-based revenue collection and distribution systems, without imposing limits on distribution.
AIM-2005-004 Author[s]: Reina Riemann, Keith Winstein Improving 802.11 Range with Forward Error Correction February 24, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-004.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-004.pdf The ISO/IEC 8802-11:1999(E) specification uses a 32-bit CRC for error detection and whole-packet retransmissions for recovery. In long-distance or high-interference links where the probability of a bit error is high, this strategy results in excessive losses, because any erroneous bit causes an entire packet to be discarded. By ignoring the CRC and adding redundancy to 802.11 payloads in software, we achieved substantially reduced loss rates on indoor and outdoor long-distance links and extended line-of-sight range outdoors by 70 percent.
AITR-2005-003 Author[s]: Christopher J. Taylor Simultaneous Localization and Tracking in Wireless Ad-hoc Sensor Networks May 31, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-003.ps ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-003.pdf In this thesis we present LaSLAT, a sensor network algorithm that simultaneously localizes sensors, calibrates sensing hardware, and tracks unconstrained moving targets using only range measurements between the sensors and the target. LaSLAT is based on a Bayesian filter, which updates a probability distribution over the quantities of interest as measurements arrive. The algorithm is distributable, and requires only a constant amount of space with respect to the number of measurements incorporated. LaSLAT is easy to adapt to new types of hardware and new physical environments due to its use of intuitive probability distributions: one adaptation demonstrated in this thesis uses a mixture measurement model to detect and compensate for bad acoustic range measurements due to echoes. We also present results from a centralized Java implementation of LaSLAT on both two- and three-dimensional sensor networks in which ranges are obtained using the Cricket ranging system. LaSLAT is able to localize sensors to within several centimeters of their ground truth positions while recovering a range measurement bias for each sensor and the complete trajectory of the mobile.
AIM-2005-003 Author[s]: Gerald Jay Sussman and Jack Wisdom Functional Differential Geometry February 2, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-003.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-003.pdf Differential geometry is deceptively simple. It is surprisingly easy to get the right answer with unclear and informal symbol manipulation. To address this problem we use computer programs to communicate a precise understanding of the computations in differential geometry. Expressing the methods of differential geometry in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.
AITR-2005-002 CBCL-251 Author[s]: Jia Jane Wu Comparing Visual Features for Morphing Based Recognition May 25, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-002.ps ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-002.pdf This thesis presents a method of object classification using the idea of deformable shape matching. Three types of visual features, geometric blur, C1 and SIFT, are used to generate feature descriptors. These feature descriptors are then used to find point correspondences between pairs of images. Various morphable models are created by small subsets of these correspondences using thin-plate spline. Given these morphs, a simple algorithm, least median of squares (LMEDS), is used to find the best morph. A scoring metric, using both LMEDS and distance transform, is used to classify test images based on a nearest neighbor algorithm. We perform the experiments on the Caltech 101 dataset [5]. To ease computation, for each test image, a shortlist is created containing 10 of the most likely candidates. We were unable to duplicate the performance of [1] in the shortlist stage because we did not use hand-segmentation to extract objects for our training images. However, our gain from the shortlist to correspondence stage is comparable to theirs. In our experiments, we improved from 21% to 28% (gain of 33%), while [1] improved from 41% to 48% (gain of 17%). We find that using a non-shape based approach, C2 [14], the overall classification rate of 33.61% is higher than all of the shaped based methods tested in our experiments.
AIM-2005-002 CBCL-244 Author[s]: Benjamin Balas Using computational models to study texture representations in the human visual system. February 7, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-002.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-002.pdf Traditionally, human texture perception has been studied using artificial textures made of random-dot patterns or abstract structured elements. At the same time, computer algorithms for the synthesis of natural textures have improved dramatically. The current study seeks to unify these two fields of research through a psychophysical assessment of a particular computational model, thus providing a sense of what image statistics are most vital for representing a range of natural textures. We employ Portilla and Simoncelli’s 2000 model of texture synthesis for this task (a parametric model of analysis and synthesis designed to mimic computations carried out by the human visual system). We find an intriguing interaction between texture type (periodic v. structured) and image statistics (autocorrelation function and filter magnitude correlations), suggesting different processing strategies may be employed for these two texture families under pre-attentive viewing.
AITR-2005-001 Author[s]: Attila Kondacs Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods January 28, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-001.ps ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-001.pdf In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns 2. articulator configuration trajectories 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively.
AIM-2005-001 Author[s]: Jacob Beal, Gerald Sussman Biologically-Inspired Robust Spatial Programming January 18, 2005 ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-001.ps ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-001.pdf Inspired by the robustness and flexibility of biological systems, we are developing linguistic and programming tools to allow us to program spatial systems populated by vast numbers of unreliable components interconnected in unknown, irregular, and time-varying ways. We organize our computations around geometry, making the fact that our system is made up of discrete individuals implicit. Geometry allows us to specify requirements in terms of the behavior of the space occupied by the aggregate rather than the behavior of individuals, thereby decreasing complexity. So we describe the behavior of space explicitly, abstracting away the discrete nature of the components. As an example, we present the Amorphous Medium Language, which describes behavior in terms of homeostatic maintenance of constraints on nested regions of space.
AIM-2004-031 CBCL-245 Author[s]: Minjoon Kouh and Tomaso Poggio A general mechanism for tuning: Gain control circuits and synapses underlie tuning of cortical neurons December 31, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-031.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-031.pdf Tuning to an optimal stimulus is a widespread property of neurons in cortex. We propose that such tuning is a consequence of normalization or gain control circuits. We also present a biologically plausible neural circuitry of tuning.
AIM-2004-030 Author[s]: Percy Liang and Nathan Srebro Methods and Experiments With Bounded Tree-width Markov Networks December 30, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-030.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-030.pdf Markov trees generalize naturally to bounded tree-width Markov networks, on which exact computations can still be done efficiently. However, learning the maximum likelihood Markov network with tree-width greater than 1 is NP-hard, so we discuss a few algorithms for approximating the optimal Markov network. We present a set of methods for training a density estimator. Each method is specified by three arguments: tree-width, model scoring metric (maximum likelihood or minimum description length), and model representation (using one joint distribution or several class-conditional distributions). On these methods, we give empirical results on density estimation and classification tasks and explore the implications of these arguments.
AIM-2004-029 Author[s]: Whitman Richards & H. Sebastian Seung Neural Voting Machines December 31, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-029.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-029.pdf “Winner-take-all” networks typically pick as winners that alternative with the largest excitatory input. This choice is far from optimal when there is uncertainty in the strength of the inputs, and when information is available about how alternatives may be related. In the Social Choice community, many other procedures will yield more robust winners. The Borda Count and the pair-wise Condorcet tally are among the most favored. Their implementations are simple modifications of classical recurrent networks.
AIM-2004-028 Author[s]: Luis Perez-Breva Cascading Regularized Classifiers April 21, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-028.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-028.pdf Among the various methods to combine classifiers, Boosting was originally thought as an stratagem to cascade pairs of classifiers through their disagreement. I recover the same idea from the work of Niyogi et al. to show how to loosen the requirement of weak learnability, central to Boosting, and introduce a new cascading stratagem. The paper concludes with an empirical study of an implementation of the cascade that, under assumptions that mirror the conditions imposed by Viola and Jones in [VJ01], has the property to preserve the generalization ability of boosting.
AIM-2004-027 Author[s]: Kristen Grauman and Trevor Darrell Efficient Image Matching with Distributions of Local Invariant Features November 22, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-027.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-027.pdf Sets of local features that are invariant to common image transformations are an effective representation to use when comparing images; current methods typically judge feature sets' similarity via a voting scheme (which ignores co-occurrence statistics) or by comparing histograms over a set of prototypes (which must be found by clustering). We present a method for efficiently comparing images based on their discrete distributions (bags) of distinctive local invariant features, without clustering descriptors. Similarity between images is measured with an approximation of the Earth Mover's Distance (EMD), which quickly computes the minimal-cost correspondence between two bags of features. Each image's feature distribution is mapped into a normed space with a low-distortion embedding of EMD. Examples most similar to a novel query image are retrieved in time sublinear in the number of examples via approximate nearest neighbor search in the embedded space. We also show how the feature representation may be extended to encode the distribution of geometric constraints between the invariant features appearing in each image. We evaluate our technique with scene recognition and texture classification tasks.
AIM-2004-026 CBCL-243 Author[s]: Thomas Serre, Lior Wolf and Tomaso Poggio A new biologically motivated framework for robust object recognition November 14, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-026.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-026.pdf In this paper, we introduce a novel set of features for robust object recognition, which exhibits outstanding performances on a variety of object categories while being capable of learning from only a few training examples. Each element of this set is a complex feature obtained by combining position- and scale-tolerant edge-detectors over neighboring positions and multiple orientations. Our system - motivated by a quantitative model of visual cortex - outperforms state-of-the-art systems on a variety of object image datasets from different groups. We also show that our system is able to learn from very few examples with no prior category knowledge. The success of the approach is also a suggestive plausibility proof for a class of feed-forward models of object recognition in cortex. Finally, we conjecture the existence of a universal overcomplete dictionary of features that could handle the recognition of all object categories.
AIM-2004-025 CBCL-242 Author[s]: Lior Wolf and Ian Martin Regularization Through Feature Knock Out November 12, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-025.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-025.pdf In this paper, we present and analyze a novel regularization technique based on enhancing our dataset with corrupted copies of the original data. The motivation is that since the learning algorithm lacks information about which parts of the data are reliable, it has to produce more robust classification functions. We then demonstrate how this regularization leads to redundancy in the resulting classifiers, which is somewhat in contrast to the common interpretations of the Occam’s razor principle. Using this framework, we propose a simple addition to the gentle boosting algorithm which enables it to work with only a few examples. We test this new algorithm on a variety of datasets and show convincing results.
AIM-2004-024 CBCL-241 Author[s]: Charles Cadieu, Minjoon Kouh, Maximilian Riesenhuber, and Tomaso Poggio Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition November 12, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-024.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-024.pdf The computational processes in the intermediate stages of the ventral pathway responsible for visual object recognition are not well understood. A recent physiological study by A. Pasupathy and C. Connor in intermediate area V4 using contour stimuli, proposes that a population of V4 neurons display bjectcentered, position-specific curvature tuning [18]. The “standard model” of object recognition, a recently developed model [23] to account for recognition properties of IT cells (extending classical suggestions by Hubel, Wiesel and others [9, 10, 19]), is used here to model the response of the V4 cells described in [18]. Our results show that a feedforward, network level mechanism can exhibit selectivity and invariance properties that correspond to the responses of the V4 cells described in [18]. These results suggest how object-centered, position-specific curvature tuning of V4 cells may arise from combinations of complex V1 cell responses. Furthermore, the model makes predictions about the responses of the same V4 cells studied by Pasupathy and Connor to novel gray level patterns, such as gratings and natural images. These predictions suggest specific experiments to further explore shape representation in V4.
AIM-2004-023 Author[s]: Kurt Steinkraus, Leslie Pack Kaelbling Combining dynamic abstractions in large MDPs October 21, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-023.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-023.pdf One of the reasons that it is difficult to plan and act in real-world domains is that they are very large. Existing research generally deals with the large domain size using a static representation and exploiting a single type of domain structure. In this paper, we create a framework that encapsulates existing and new abstraction and approximation methods into modules, and combines arbitrary modules into a system that allows for dynamic representation changes. We show that the dynamic changes of representation allow our framework to solve larger and more interesting domains than were previously possible, and while there are no optimality guarantees, suitable module choices gain tractability at little cost to optimality.
AIM-2004-022 Author[s]: Gene Yeo, Eric Van Nostrand, Dirk Holste, Tomaso Poggio, Christopher Burge Predictive identification of alternative events conserved in human and mouse September 30, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-022.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-022.pdf Alternative pre-messenger RNA splicing affects a majority of human genes and plays important roles in development and disease. Alternative splicing (AS) events conserved since the divergence of human and mouse are likely of primary biological importance, but relatively few such events are known. Here we describe sequence features that distinguish exons subject to evolutionarily conserved AS, which we call 'alternative- conserved exons' (ACEs) from other orthologous human/mouse exons, and integrate these features into an exon classification algorithm, ACEScan. Genome-wide analysis of annotated orthologous human-mouse exon pairs identified ~2,000 predicted ACEs. Alternative splicing was verified in both human and mouse tissues using an RT-PCR- sequencing protocol for 21 of 30 (70%) predicted ACEs tested, supporting the validity of a majority of ACEScan predictions. By contrast, AS was observed in mouse tissues for only 2 of 15 (13%) tested exons which had EST or cDNA evidence of AS in human but were not predicted ACEs, and was never observed for eleven negative control exons in human or mouse tissues. Predicted ACEs were much more likely to preserve reading frame, and less likely to disrupt protein domains than other AS events, and were enriched in genes expressed in the brain and in genes involved in transcriptional regulation, RNA processing and development. Our results also imply that the vast majority of AS events represented in the human EST databases are not conserved in mouse, and therefore may represent aberrant, disease- or allele-specific, or highly lineage-restricted splicing events.
AIM-2004-021 Author[s]: Michael R. Benjamin The Interval Programming Model for Multi-objective Decision Making September 27, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-021.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-021.pdf The interval programming model (IvP) is a mathematical programming model for representing and solving multi-objective optimization problems. The central characteristic of the model is the use of piecewise linearly defined objective functions and a solution method that searches through the combination space of pieces rather than through the actual decision space. The piecewise functions typically represent an approximation of some underlying function, but this concession is balanced on the positive side by relative freedom from function form assumptions as well as the assurance of global optimality. In this paper the model and solution algorithms are described, and the applicability of IvP to certain applications are discussed.
AIM-2004-020 CBCL-240 Author[s]: Gabriel Kreiman, Chou Hung, Tomaso Poggio, James DiCarlo Selectivity of Local Field Potentials in Macaque Inferior Temporal Cortex September 21, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-020.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-020.pdf While single neurons in inferior temporal (IT) cortex show differential responses to distinct complex stimuli, little is known about the responses of populations of neurons in IT. We recorded single electrode data, including multi-unit activity (MUA) and local field potentials (LFP), from 618 sites in the inferior temporal cortex of macaque monkeys while the animals passively viewed 78 different pictures of complex stimuli. The LFPs were obtained by low-pass filtering the extracellular electrophysiological signal with a corner frequency of 300 Hz. As reported previously, we observed that spike counts from MUA showed selectivity for some of the pictures. Strikingly, the LFP data, which is thought to constitute an average over large numbers of neurons, also showed significantly selective responses. The LFP responses were less selective than the MUA responses both in terms of the proportion of selective sites as well as in the selectivity of each site. We observed that there was only little overlap between the selectivity of MUA and LFP recordings from the same electrode. To assess the spatial organization of selective responses, we compared the selectivity of nearby sites recorded along the same penetration and sites recorded from different penetrations. We observed that MUA selectivity was correlated on spatial scales up to 800 m while the LFP selectivity was correlated over a larger spatial extent, with significant correlations between sites separated by several mm. Our data support the idea that there is some topographical arrangement to the organization of selectivity in inferior temporal cortex and that this organization may be relevant for the representation of object identity in IT.
AIM-2004-019 Author[s]: Charles Kemp, Thomas L. Griffiths and Joshua B. Tenenbaum Discovering Latent Classes in Relational Data July 22, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-019.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-019.pdf We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entities that exist in a domain, the number of these latent classes, and the relations between classes that are possible or likely. This model goes beyond previous psychological models of category learning, which consider attributes associated with individual categories but not relationships between categories. We apply this domain-general framework to two specific problems: learning the structure of kinship systems and learning causal theories.
AIM-2004-018 Author[s]: Ozlem Uzuner Distribution Volume Tracking on Privacy-Enhanced Wireless Grid July 25, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-018.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-018.pdf In this paper, we discuss a wireless grid in which users are highly mobile, and form ad-hoc and sometimes short-lived connections with other devices. As they roam through networks, the users may choose to employ privacy-enhancing technologies to address their privacy needs and benefit from the computational power of the grid for a variety of tasks, including sharing content. The high rate of mobility of the users on the wireless grid, when combined with privacy enhancing mechanisms and ad-hoc connections, makes it difficult to conclusively link devices and/or individuals with network activities and to hold them liable for particular downloads. Protecting intellectual property in this scenario requires a solution that can work in absence of knowledge about behavior of particular individuals. Building on previous work, we argue for a solution that ensures proper compensation to content owners without inhibiting use and dissemination of works. Our proposal is based on digital tracking for measuring distribution volume of content and compensation of authors based on this accounting information. The emphasis is on obtaining good estimates of rate of popularity of works, without keeping track of activities of individuals or devices. The contribution of this paper is a revenue protection mechanism, Distribution Volume Tracking, that does not invade the privacy of users in the wireless grid and works even in the presence of privacy-enhancing technologies they may employ.
AIM-2004-017 CBCL-239 Author[s]: Thomas Serre and Maximilian Riesenhuber Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex July 27, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-017.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-017.pdf Riesenhuber \& Poggio recently proposed a model of object recognition in cortex which, beyond integrating general beliefs about the visual system in a quantitative framework, made testable predictions about visual processing. In particular, they showed that invariant object representation could be obtained with a selective pooling mechanism over properly chosen afferents through a {\sc max} operation: For instance, at the complex cells level, pooling over a group of simple cells at the same preferred orientation and position in space but at slightly different spatial frequency would provide scale tolerance, while pooling over a group of simple cells at the same preferred orientation and spatial frequency but at slightly different position in space would provide position tolerance. Indirect support for such mechanisms in the visual system come from the ability of the architecture at the top level to replicate shape tuning as well as shift and size invariance properties of ``view-tuned cells'' (VTUs) found in inferotemporal cortex (IT), the highest area in the ventral visual stream, thought to be crucial in mediating object recognition in cortex. There is also now good physiological evidence that a {\sc max} operation is performed at various levels along the ventral stream. However, in the original paper by Riesenhuber \& Poggio, tuning and pooling parameters of model units in early and intermediate areas were only qualitatively inspired by physiological data. In particular, many studies have investigated the tuning properties of simple and complex cells in primary visual cortex, V1. We show that units in the early levels of HMAX can be tuned to produce realistic simple and complex cell-like tuning, and that the earlier findings on the invariance properties of model VTUs still hold in this more realistic version of the model.
AIM-2004-016 Author[s]: Tevfik Metin Sezgin and Randall Davis Early Sketch Processing with Application in HMM Based Sketch Recognition July 28, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-016.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-016.pdf Freehand sketching is a natural and crucial part of everyday human interaction, yet is almost totally unsupported by current user interfaces. With the increasing availability of tablet notebooks and pen based PDAs, sketch based interaction has gained attention as a natural interaction modality. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer, to produce a user interface for design that feels as natural as paper, yet is considerably smarter. One of the most basic tasks in accomplishing this is converting the original digitized pen strokes in a sketch into the intended geometric objects. In this paper we describe an implemented system that combines multiple sources of knowledge to provide robust early processing for freehand sketching. We also show how this early processing system can be used as part of a fast sketch recognition system with polynomial time segmentation and recognition algorithms.
AIM-2004-015 Author[s]: Mihai Badoiu, Piotr Indyk, Anastasios Sidiropoulos A Constant-Factor Approximation Algorithm for Embedding Unweighted Graphs into Trees July 5, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-015.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-015.pdf We present a constant-factor approximation algorithm for computing an embedding of the shortest path metric of an unweighted graph into a tree, that minimizes the multiplicative distortion.
AIM-2004-014 Author[s]: Piotr Indyk and David Woodruff Optimal Approximations of the Frequency Moments July 2, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-014.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-014.pdf We give a one-pass, O~(m^{1-2/k})-space algorithm for estimating the k-th frequency moment of a data stream for any real k>2. Together with known lower bounds, this resolves the main problem left open by Alon, Matias, Szegedy, STOC'96. Our algorithm enables deletions as well as insertions of stream elements.
AIM-2004-013 Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman Contextual models for object detection using boosted random fields June 25, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-013.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-013.pdf We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.
AIM-2004-012 Author[s]: Jaime Teevan How People Re-find Information When the Web Changes June 18, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-012.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-012.pdf This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.
AIM-2004-011 Author[s]: Lilla Zollei, John Fisher, William Wells A Unified Statistical and Information Theoretic Framework for Multi-modal Image Registration April 28, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-011.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-011.pdf We formulate and interpret several multi-modal registration methods in the context of a unified statistical and information theoretic framework. A unified interpretation clarifies the implicit assumptions of each method yielding a better understanding of their relative strengths and weaknesses. Additionally, we discuss a generative statistical model from which we derive a novel analysis tool, the "auto-information function", as a means of assessing and exploiting the common spatial dependencies inherent in multi-modal imagery. We analytically derive useful properties of the "auto-information" as well as verify them empirically on multi-modal imagery. Among the useful aspects of the "auto-information function" is that it can be computed from imaging modalities independently and it allows one to decompose the search space of registration problems.
AIM-2004-010 CBCL-238 Author[s]: Jerry Jun Yokono and Tomaso Poggio Rotation Invariant Object Recognition from One Training Example April 27, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-010.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-010.pdf Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor--based on a set of oriented Gaussian derivative filters-- is used in our recognition system. We report here an evaluation of several techniques for orientation estimation to achieve rotation invariance of the descriptor. We also describe feature selection based on a single training image. Virtual images are generated by rotating and rescaling the image and robust features are selected. The results confirm robust performance in cluttered scenes, in the presence of partial occlusions, and when the object is embedded in different backgrounds.
AITR-2004-009 Author[s]: Nathan Srebro Learning with Matrix Factorizations November 22, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-009.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-009.pdf Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or high-dimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent years (Latent Semantic Indexing, Aspect Models, Probabilistic PCA, Exponential PCA, Non-Negative Matrix Factorization and others). In this thesis we address several issues related to learning with matrix factorizations: we study the asymptotic behavior and generalization ability of existing methods, suggest new optimization methods, and present a novel maximum-margin high-dimensional matrix factorization formulation.
AIM-2004-009 Author[s]: Antonio Torralba Contextual Influences on Saliency April 14, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-009.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-009.pdf This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition.
AITR-2004-008 Author[s]: Justin Werfel Neural Network Models for Zebra Finch Song Production and Reinforcement Learning November 9, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-008.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-008.pdf The zebra finch is a standard experimental system for studying learning and generation of temporally extended motor patterns. The first part of this project concerned the evaluation of simple models for the operation and structure of the network in the motor nucleus RA. A directed excitatory chain with a global inhibitory network, for which experimental evidence exists, was found to produce waves of activity similar to those observed in RA; this similarity included one particularly important feature of the measured activity, synchrony between the onset of bursting in one neuron and the offset of bursting in another. Other models, which were simpler and more analytically tractable, were also able to exhibit this feature, but not for parameter values quantitatively close to those observed. Another issue of interest concerns how these networks are initially learned by the bird during song acquisition. The second part of the project concerned the analysis of exemplars of REINFORCE algorithms, a general class of algorithms for reinforcement learning in neural networks, which are on several counts more biologically plausible than standard prescriptions such as backpropagation. The former compared favorably with backpropagation on tasks involving single input-output pairs, though a noise analysis suggested it should not perform so well. On tasks involving trajectory learning, REINFORCE algorithms meet with some success, though the analysis that predicts their success on input-output-pair tasks fails to explain it for trajectories.
AIM-2004-008 Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman Sharing visual features for multiclass and multiview object detection April 14, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-008.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-008.pdf We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
AITR-2004-007 Author[s]: Lisa Tucker-Kellogg Systematic Conformational Search with Constraint Satisfaction October 1, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-007.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-007.pdf Throughout biological, chemical, and pharmaceutical research, conformational searches are used to explore the possible three-dimensional configurations of molecules. This thesis describes a new systematic method for conformational search, including an application of the method to determining the structure of a peptide via solid-state NMR spectroscopy. A separate portion of the thesis is about protein-DNA binding, with a three-dimensional macromolecular structure determined by x-ray crystallography. The search method in this thesis enumerates all conformations of a molecule (at a given level of torsion angle resolution) that satisfy a set of local geometric constraints, such as constraints derived from NMR experiments. Systematic searches, historically used for small molecules, generally now use some form of divide-and-conquer for application to larger molecules. Our method can achieve a significant improvement in runtime by making some major and counter-intuitive modifications to traditional divide-and-conquer: (1) OmniMerge divides a polymer into many alternative pairs of subchains and searches all the pairs, instead of simply cutting in half and searching two subchains. Although the extra searches may appear wasteful, the bottleneck stage of the overall search, which is to re-connect the conformations of the largest subchains, can be greatly accelerated by the availability of alternative pairs of sidechains. (2) Propagation of disqualified conformations across overlapping subchains can disqualify infeasible conformations very rapidly, which further offsets the cost of searching the extra subchains of OmniMerge. (3) The search may be run in two stages, once at low-resolution using a side-effect of OmniMerge to determine an optimal partitioning of the molecule into efficient subchains; then again at high-resolution while making use of the precomputed subchains. (4) An A* function prioritizes each subchain based on estimated future search costs. Subchains with sufficiently low priority can be omitted from the search, which improves efficiency. A common theme of these four ideas is to make good choices about how to break the large search problem into lower-dimensional subproblems. In addition, the search method uses heuristic local searches within the overall systematic framework, to maintain the systematic guarantee while providing the empirical efficiency of stochastic search. These novel algorithms were implemented and the effectiveness of each innovation is demonstrated on a highly constrained peptide with 40 degrees of freedom.
AIM-2004-007 CBCL-237 Author[s]: Jerry Jun Yokono and Tomaso Poggio Evaluation of sets of oriented and non-oriented receptive fields as local descriptors March 24, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-007.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-007.pdf Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. We propose a performance criterion for a local descriptor based on the tradeoff between selectivity and invariance. In this paper, we evaluate several local descriptors with respect to selectivity and invariance. The descriptors that we evaluated are Gaussian derivatives up to the third order, gray image patches, and Laplacian-based descriptors with either three scales or one scale filters. We compare selectivity and invariance to several affine changes such as rotation, scale, brightness, and viewpoint. Comparisons have been made keeping the dimensionality of the descriptors roughly constant. The overall results indicate a good performance by the descriptor based on a set of oriented Gaussian filters. It is interesting that oriented receptive fields similar to the Gaussian derivatives as well as receptive fields similar to the Laplacian are found in primate visual cortex.
AITR-2004-006 Author[s]: Artur Miguel Arsenio Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift September 26, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-006.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-006.pdf The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes, which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.
AIM-2004-006 CBCL-236 Author[s]: Riesenhuber, Jarudi, Gilad, Sinha Face processing in humans is compatible with a simple shape-based model of vision March 5, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-006.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-006.pdf Understanding how the human visual system recognizes objects is one of the key challenges in neuroscience. Inspired by a large body of physiological evidence (Felleman and Van Essen, 1991; Hubel and Wiesel, 1962; Livingstone and Hubel, 1988; Tso et al., 2001; Zeki, 1993), a general class of recognition models has emerged which is based on a hierarchical organization of visual processing, with succeeding stages being sensitive to image features of increasing complexity (Hummel and Biederman, 1992; Riesenhuber and Poggio, 1999; Selfridge, 1959). However, these models appear to be incompatible with some well-known psychophysical results. Prominent among these are experiments investigating recognition impairments caused by vertical inversion of images, especially those of faces. It has been reported that faces that differ “featurally” are much easier to distinguish when inverted than those that differ “configurally” (Freire et al., 2000; Le Grand et al., 2001; Mondloch et al., 2002) – a finding that is difficult to reconcile with the aforementioned models. Here we show that after controlling for subjects’ expectations, there is no difference between “featurally” and “configurally” transformed faces in terms of inversion effect. This result reinforces the plausibility of simple hierarchical models of object representation and recognition in cortex.
AIM-2004-005 Author[s]: Howard Shrobe and Robert Laddaga New Architectural Models for Visibly Controllable Computing: The Relevance of Dynamic Object Oriented Architectures and Plan Based Computing Models February 9, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-005.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-005.pdf Traditionally, we've focussed on the question of how to make a system easy to code the first time, or perhaps on how to ease the system's continued evolution. But if we look at life cycle costs, then we must conclude that the important question is how to make a system easy to operate. To do this we need to make it easy for the operators to see what's going on and to then manipulate the system so that it does what it is supposed to. This is a radically different criterion for success. What makes a computer system visible and controllable? This is a difficult question, but it's clear that today's modern operating systems with nearly 50 million source lines of code are neither. Strikingly, the MIT Lisp Machine and its commercial successors provided almost the same functionality as today's mainstream sytsems, but with only 1 Million lines of code. This paper is a retrospective examination of the features of the Lisp Machine hardware and software system. Our key claim is that by building the Object Abstraction into the lowest tiers of the system, great synergy and clarity were obtained. It is our hope that this is a lesson that can impact tomorrow's designs. We also speculate on how the spirit of the Lisp Machine could be extended to include a comprehensive access control model and how new layers of abstraction could further enrich this model.
AITR-2004-004 Author[s]: Robert A. Hearn Building Grounded Abstractions for Artificial Intelligence Programming June 16, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-004.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-004.pdf Most Artificial Intelligence (AI) work can be characterized as either ``high-level'' (e.g., logical, symbolic) or ``low-level'' (e.g., connectionist networks, behavior-based robotics). Each approach suffers from particular drawbacks. High-level AI uses abstractions that often have no relation to the way real, biological brains work. Low-level AI, on the other hand, tends to lack the powerful abstractions that are needed to express complex structures and relationships. I have tried to combine the best features of both approaches, by building a set of programming abstractions defined in terms of simple, biologically plausible components. At the ``ground level'', I define a primitive, perceptron-like computational unit. I then show how more abstract computational units may be implemented in terms of the primitive units, and show the utility of the abstract units in sample networks. The new units make it possible to build networks using concepts such as long-term memories, short-term memories, and frames. As a demonstration of these abstractions, I have implemented a simulator for ``creatures'' controlled by a network of abstract units. The creatures exist in a simple 2D world, and exhibit behaviors such as catching mobile prey and sorting colored blocks into matching boxes. This program demonstrates that it is possible to build systems that can interact effectively with a dynamic physical environment, yet use symbolic representations to control aspects of their behavior.
AIM-2004-004 CBCL-235 Author[s]: Robert Schneider and Maximilian Riesenhuber On the difficulty of feature-based attentional modulations in visual object recognition: A modeling study. January 14, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-004.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-004.pdf Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.
AITR-2004-003 Author[s]: Jonathan A. Goler BioJADE: A Design and Simulation Tool for Synthetic Biological Systems May 28, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-003.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-003.pdf The next generations of both biological engineering and computer engineering demand that control be exerted at the molecular level. Creating, characterizing and controlling synthetic biological systems may provide us with the ability to build cells that are capable of a plethora of activities, from computation to synthesizing nanostructures. To develop these systems, we must have a set of tools not only for synthesizing systems, but also designing and simulating them. The BioJADE project provides a comprehensive, extensible design and simulation platform for synthetic biology. BioJADE is a graphical design tool built in Java, utilizing a database back end, and supports a range of simulations using an XML communication protocol. BioJADE currently supports a library of over 100 parts with which it can compile designs into actual DNA, and then generate synthesis instructions to build the physical parts. The BioJADE project contributes several tools to Synthetic Biology. BioJADE in itself is a powerful tool for synthetic biology designers. Additionally, we developed and now make use of a centralized BioBricks repository, which enables the sharing of BioBrick components between researchers, and vastly reduces the barriers to entry for aspiring Synthetic Biologists.
AIM-2004-003 Author[s]: Kristen Grauman, Gregory Shakhnarovich, Trevor Darrell Virtual Visual Hulls: Example-Based 3D Shape Estimation from a Single Silhouette January 28, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-003.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-003.pdf Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.
AITR-2004-002 Author[s]: Jonathan Kennell Generative Temporal Planning with Complex Processes May 18, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-002.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-002.pdf Autonomous vehicles are increasingly being used in mission-critical applications, and robust methods are needed for controlling these inherently unreliable and complex systems. This thesis advocates the use of model-based programming, which allows mission designers to program autonomous missions at the level of a coach or wing commander. To support such a system, this thesis presents the Spock generative planner. To generate plans, Spock must be able to piece together vehicle commands and team tactics that have a complex behavior represented by concurrent processes. This is in contrast to traditional planners, whose operators represent simple atomic or durative actions. Spock represents operators using the RMPL language, which describes behaviors using parallel and sequential compositions of state and activity episodes. RMPL is useful for controlling mobile autonomous missions because it allows mission designers to quickly encode expressive activity models using object-oriented design methods and an intuitive set of activity combinators. Spock also is significant in that it uniformly represents operators and plan-space processes in terms of Temporal Plan Networks, which support temporal flexibility for robust plan execution. Finally, Spock is implemented as a forward progression optimal planner that walks monotonically forward through plan processes, closing any open conditions and resolving any conflicts. This thesis describes the Spock algorithm in detail, along with example problems and test results.
AIM-2004-002 CBCL-234 Author[s]: Lior Wolf, Amnon Shashua, and Sayan Mukherjee Selecting Relevant Genes with a Spectral Approach January 27, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-002.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-002.pdf Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size — in the order of few tens of tissue samples — and by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches.
AITR-2004-001 Author[s]: Oana L. Stamatoiu Learning Commonsense Categorical Knowledge in a Thread Memory System May 18, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-001.ps ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-001.pdf If we are to understand how we can build machines capable of broad purpose learning and reasoning, we must first aim to build systems that can represent, acquire, and reason about the kinds of commonsense knowledge that we humans have about the world. This endeavor suggests steps such as identifying the kinds of knowledge people commonly have about the world, constructing suitable knowledge representations, and exploring the mechanisms that people use to make judgments about the everyday world. In this work, I contribute to these goals by proposing an architecture for a system that can learn commonsense knowledge about the properties and behavior of objects in the world. The architecture described here augments previous machine learning systems in four ways: (1) it relies on a seven dimensional notion of context, built from information recently given to the system, to learn and reason about objects' properties; (2) it has multiple methods that it can use to reason about objects, so that when one method fails, it can fall back on others; (3) it illustrates the usefulness of reasoning about objects by thinking about their similarity to other, better known objects, and by inferring properties of objects from the categories that they belong to; and (4) it represents an attempt to build an autonomous learner and reasoner, that sets its own goals for learning about the world and deduces new facts by reflecting on its acquired knowledge. This thesis describes this architecture, as well as a first implementation, that can learn from sentences such as ``A blue bird flew to the tree'' and ``The small bird flew to the cage'' that birds can fly. One of the main contributions of this work lies in suggesting a further set of salient ideas about how we can build broader purpose commonsense artificial learners and reasoners.
AIM-2004-001 CBCL-233 Author[s]: Alexander Rakhlin, Dmitry Panchenko, Sayan Mukherjee Risk Bounds for Mixture Density Estimation January 27, 2004 ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-001.ps ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-001.pdf In this paper we focus on the problem of estimating a bounded density using a finite combination of densities from a given class. We consider the Maximum Likelihood Procedure (MLE) and the greedy procedure described by Li and Barron. Approximation and estimation bounds are given for the above methods. We extend and improve upon the estimation results of Li and Barron, and in particular prove an $O(\frac{1}{\sqrt{n}})$ bound on the estimation error which does not depend on the number of densities in the estimated combination.
AIM-2003-027 Author[s]: Jacob Beal and Seth Gilbert RamboNodes for the Metropolitan Ad Hoc Network December 17, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-027.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-027.pdf We present an algorithm to store data robustly in a large, geographically distributed network by means of localized regions of data storage that move in response to changing conditions. For example, data might migrate away from failures or toward regions of high demand. The PersistentNode algorithm provides this service robustly, but with limited safety guarantees. We use the RAMBO framework to transform PersistentNode into RamboNode, an algorithm that guarantees atomic consistency in exchange for increased cost and decreased liveness. In addition, a half-life analysis of RamboNode shows that it is robust against continuous low-rate failures. Finally, we provide experimental simulations for the algorithm on 2000 nodes, demonstrating how it services requests and examining how it responds to failures.
AIM-2003-026 Author[s]: Kristen Grauman and Trevor Darrell Fast Contour Matching Using Approximate Earth Mover's Distance December 5, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-026.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-026.pdf Weighted graph matching is a good way to align a pair of shapes represented by a set of descriptive local features; the set of correspondences produced by the minimum cost of matching features from one shape to the features of the other often reveals how similar the two shapes are. However, due to the complexity of computing the exact minimum cost matching, previous algorithms could only run efficiently when using a limited number of features per shape, and could not scale to perform retrievals from large databases. We present a contour matching algorithm that quickly computes the minimum weight matching between sets of descriptive local features using a recently introduced low-distortion embedding of the Earth Mover's Distance (EMD) into a normed space. Given a novel embedded contour, the nearest neighbors in a database of embedded contours are retrieved in sublinear time via approximate nearest neighbors search. We demonstrate our shape matching method on databases of 10,000 images of human figures and 60,000 images of handwritten digits.
AIM-2003-025 Author[s]: Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling Mobilized ad-hoc networks: A reinforcement learning approach December 4, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-025.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-025.pdf Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies.
AIM-2003-024 CBCL-232 Author[s]: Christian Morgenstern, Bernd Heisele Component based recognition of objects in an office environment November 28, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-024.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-024.pdf We present a component-based approach for recognizing objects under large pose changes. From a set of training images of a given object we extract a large number of components which are clustered based on the similarity of their image features and their locations within the object image. The cluster centers build an initial set of component templates from which we select a subset for the final recognizer. In experiments we evaluate different sizes and types of components and three standard techniques for component selection. The component classifiers are finally compared to global classifiers on a database of four objects.
AIM-2003-023 Author[s]: Jacob Eisenstein Evolving Robocode Tank Fighters October 28, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-023.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-023.pdf In this paper, I describe the application of genetic programming to evolve a controller for a robotic tank in a simulated environment. The purpose is to explore how genetic techniques can best be applied to produce controllers based on subsumption and behavior oriented languages such as REX. As part of my implementation, I developed TableRex, a modification of REX that can be expressed on a fixed-length genome. Using a fixed subsumption architecture of TableRex modules, I evolved robots that beat some of the most competitive hand-coded adversaries.
AIM-2003-022 Author[s]: Michael G. Ross and Leslie Pack Kaelbling Learning object segmentation from video data September 8, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-022.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-022.pdf This memo describes the initial results of a project to create a self-supervised algorithm for learning object segmentation from video data. Developmental psychology and computational experience have demonstrated that the motion segmentation of objects is a simpler, more primitive process than the detection of object boundaries by static image cues. Therefore, motion information provides a plausible supervision signal for learning the static boundary detection task and for evaluating performance on a test set. A video camera and previously developed background subtraction algorithms can automatically produce a large database of motion-segmented images for minimal cost. The purpose of this work is to use the information in such a database to learn how to detect the object boundaries in novel images using static information, such as color, texture, and shape. This work was funded in part by the Office of Naval Research contract #N00014-00-1-0298, in part by the Singapore-MIT Alliance agreement of 11/6/98, and in part by a National Science Foundation Graduate Student Fellowship.
AIM-2003-021 CBCL-231 Author[s]: Minjoon Kouh and Maximilian Riesenhuber Investigating shape representation in area V4 with HMAX: Orientation and Grating selectivities September 8, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-021.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-021.pdf The question of how shape is represented is of central interest to understanding visual processing in cortex. While tuning properties of the cells in early part of the ventral visual stream, thought to be responsible for object recognition in the primate, are comparatively well understood, several different theories have been proposed regarding tuning in higher visual areas, such as V4. We used the model of object recognition in cortex presented by Riesenhuber and Poggio (1999), where more complex shape tuning in higher layers is the result of combining afferent inputs tuned to simpler features, and compared the tuning properties of model units in intermediate layers to those of V4 neurons from the literature. In particular, we investigated the issue of shape representation in visual area V1 and V4 using oriented bars and various types of gratings (polar, hyperbolic, and Cartesian), as used in several physiology experiments. Our computational model was able to reproduce several physiological findings, such as the broadening distribution of the orientation bandwidths and the emergence of a bias toward non-Cartesian stimuli. Interestingly, the simulation results suggest that some V4 neurons receive input from afferents with spatially separated receptive fields, leading to experimentally testable predictions. However, the simulations also show that the stimulus set of Cartesian and non-Cartesian gratings is not sufficiently complex to probe shape tuning in higher areas, necessitating the use of more complex stimulus sets.
AIM-2003-020 CBCL-230 Author[s]: Hiroaki Shimizu and Tomaso Poggio Direction Estimation of Pedestrian from Images August 27, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-020.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-020.pdf The capability of estimating the walking direction of people would be useful in many applications such as those involving autonomous cars and robots. We introduce an approach for estimating the walking direction of people from images, based on learning the correct classification of a still image by using SVMs. We find that the performance of the system can be improved by classifying each image of a walking sequence and combining the outputs of the classifier. Experiments were performed to evaluate our system and estimate the trade-off between number of images in walking sequences and performance.
AIM-2003-019 Author[s]: Sayan Mukherjee, Polina Golland and Dmitry Panchenko Permutation Tests for Classification August 28, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-019.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-019.pdf We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.
AIM-2003-018 CBCL-229 Author[s]: Benjamin J. Balas, Pawan Sinha Dissociated Dipoles: Image representation via non-local comparisons August 13, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-018.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-018.pdf A fundamental question in visual neuroscience is how to represent image structure. The most common representational schemes rely on differential operators that compare adjacent image regions. While well-suited to encoding local relationships, such operators have significant drawbacks. Specifically, each filter’s span is confounded with the size of its sub-fields, making it difficult to compare small regions across large distances. We find that such long-distance comparisons are more tolerant to common image transformations than purely local ones, suggesting they may provide a useful vocabulary for image encoding. . We introduce the “Dissociated Dipole,” or “Sticks” operator, for encoding non-local image relationships. This operator de-couples filter span from sub-field size, enabling parametric movement between edge and region-based representation modes. We report on the perceptual plausibility of the operator, and the computational advantages of non-local encoding. Our results suggest that non-local encoding may be an effective scheme for representing image structure.
AITR-2003-017 Author[s]: Austin Che Fluorescence Assay for Polymerase Arrival Rates August 31, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-017.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-017.pdf To engineer complex synthetic biological systems will require modular design, assembly, and characterization strategies. The RNA polymerase arrival rate (PAR) is defined to be the rate that RNA polymerases arrive at a specified location on the DNA. Designing and characterizing biological modules in terms of RNA polymerase arrival rates provides for many advantages in the construction and modeling of biological systems. PARMESAN is an in vitro method for measuring polymerase arrival rates using pyrrolo-dC, a fluorescent DNA base that can substitute for cytosine. Pyrrolo-dC shows a detectable fluorescence difference when in single-stranded versus double-stranded DNA. During transcription, RNA polymerase separates the two strands of DNA, leading to a change in the fluorescence of pyrrolo-dC. By incorporating pyrrolo-dC at specific locations in the DNA, fluorescence changes can be taken as a direct measurement of the polymerase arrival rate.
AIM-2003-017 Author[s]: Jacob Beal Near-Optimal Distributed Failure Circumscription August 11, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-017.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-017.pdf Small failures should only disrupt a small part of a network. One way to do this is by marking the surrounding area as untrustworthy --- circumscribing the failure. This can be done with a distributed algorithm using hierarchical clustering and neighbor relations, and the resulting circumscription is near-optimal for convex failures.
AIM-2003-016 Author[s]: Krzysztof Gajos and Howard Shrobe Delegation, Arbitration and High-Level Service Discovery as Key Elements of a Software Infrastructure for Pervasive Computing June 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-016.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-016.pdf The dream of pervasive computing is slowly becoming a reality. A number of projects around the world are constantly contributing ideas and solutions that are bound to change the way we interact with our environments and with one another. An essential component of the future is a software infrastructure that is capable of supporting interactions on scales ranging from a single physical space to intercontinental collaborations. Such infrastructure must help applications adapt to very diverse environments and must protect people’s privacy and respect their personal preferences. In this paper we indicate a number of limitations present in the software infrastructures proposed so far (including our previous work). We then describe the framework for building an infrastructure that satisfies the abovementioned criteria. This framework hinges on the concepts of delegation, arbitration and high-level service discovery. Components of our own implementation of such an infrastructure are presented.
AITR-2003-016 Author[s]: Pedro F. Felzenszwalb Representation and Detection of Shapes in Images August 8, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-016.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-016.pdf We present a set of techniques that can be used to represent and detect shapes in images. Our methods revolve around a particular shape representation based on the description of objects using triangulated polygons. This representation is similar to the medial axis transform and has important properties from a computational perspective. The first problem we consider is the detection of non-rigid objects in images using deformable models. We present an efficient algorithm to solve this problem in a wide range of situations, and show examples in both natural and medical images. We also consider the problem of learning an accurate non-rigid shape model for a class of objects from examples. We show how to learn good models while constraining them to the form required by the detection algorithm. Finally, we consider the problem of low-level image segmentation and grouping. We describe a stochastic grammar that generates arbitrary triangulated polygons while capturing Gestalt principles of shape regularity. This grammar is used as a prior model over random shapes in a low level algorithm that detects objects in images.
AIM-2003-015 Author[s]: Kimberle Koile, Konrad Tollmar, David Demirdjian, Howard Shrobe and Trevor Darrell Activity Zones for Context-Aware Computing June 10, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-015.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-015.pdf Location is a primary cue in many context-aware computing systems, and is often represented as a global coordinate, room number, or Euclidean distance various landmarks. A user?s concept of location, however, is often defined in terms of regions in which common activities occur. We show how to partition a space into such regions based on patterns of observed user location and motion. These regions, which we call activity zones, represent regions of similar user activity, and can be used to trigger application actions, retrieve information based on previous context, and present information to users. We suggest that context- aware applications can benefit from a location representation learned from observing users. We describe an implementation of our system and present two example applications whose behavior is controlled by users? entry, exit, and presence in the zones.
AITR-2003-015 Author[s]: Samson Timoner Compact Representations for Fast Nonrigid Registration of Medical Images July 4, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-015.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-015.pdf We develop efficient techniques for the non-rigid registration of medical images by using representations that adapt to the anatomy found in such images. Images of anatomical structures typically have uniform intensity interiors and smooth boundaries. We create methods to represent such regions compactly using tetrahedra. Unlike voxel-based representations, tetrahedra can accurately describe the expected smooth surfaces of medical objects. Furthermore, the interior of such objects can be represented using a small number of tetrahedra. Rather than describing a medical object using tens of thousands of voxels, our representations generally contain only a few thousand elements. Tetrahedra facilitate the creation of efficient non-rigid registration algorithms based on finite element methods (FEM). We create a fast, FEM-based method to non-rigidly register segmented anatomical structures from two subjects. Using our compact tetrahedral representations, this method generally requires less than one minute of processing time on a desktop PC. We also create a novel method for the non-rigid registration of gray scale images. To facilitate a fast method, we create a tetrahedral representation of a displacement field that automatically adapts to both the anatomy in an image and to the displacement field. The resulting algorithm has a computational cost that is dominated by the number of nodes in the mesh (about 10,000), rather than the number of voxels in an image (nearly 10,000,000). For many non-rigid registration problems, we can find a transformation from one image to another in five minutes. This speed is important as it allows use of the algorithm during surgery. We apply our algorithms to find correlations between the shape of anatomical structures and the presence of schizophrenia. We show that a study based on our representations outperforms studies based on other representations. We also use the results of our non-rigid registration algorithm as the basis of a segmentation algorithm. That algorithm also outperforms other methods in our tests, producing smoother segmentations and more accurately reproducing manual segmentations.
AIM-2003-014 Author[s]: Martin C. Martin The Essential Dynamics Algorithm: Essential Results May 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-014.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-014.pdf This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic one is presented which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results in a bicycle riding domain are presented. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. All code used in the experiments is available on the project’s web site.
AITR-2003-014 Author[s]: Lily Lee Gait Analysis for Classification June 26, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-014.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-014.pdf This thesis describes a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple localized image features such as moments extracted from orthogonal view video silhouettes of human walking motion. A suite of time-integration methods, spanning a range of coarseness of time aggregation and modeling of feature distributions, are applied to these image features to create a suite of gait sequence representations. Despite their simplicity, the resulting feature vectors contain enough information to perform well on human identification and gender classification tasks. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times and under varying lighting environments. Each of the integration methods are investigated for their advantages and disadvantages. An improved gait representation is built based on our experiences with the initial set of gait representations. In addition, we show gender classification results using our gait appearance features, the effect of our heuristic feature selection method, and the significance of individual features.
AIM-2003-013 Author[s]: Lawrence Shih and David Karger Learning Classes Correlated to a Hierarchy May 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-013.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-013.pdf Trees are a common way of organizing large amounts of information by placing items with similar characteristics near one another in the tree. We introduce a classification problem where a given tree structure gives us information on the best way to label nearby elements. We suggest there are many practical problems that fall under this domain. We propose a way to map the classification problem onto a standard Bayesian inference problem. We also give a fast, specialized inference algorithm that incrementally updates relevant probabilities. We apply this algorithm to web-classification problems and show that our algorithm empirically works well.
AITR-2003-013 Author[s]: Matthew J. Marjanovic Teaching an Old Robot New Tricks: Learning Novel Tasks via Interaction with People and Things June 20, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-013.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-013.pdf As AI has begun to reach out beyond its symbolic, objectivist roots into the embodied, experientialist realm, many projects are exploring different aspects of creating machines which interact with and respond to the world as humans do. Techniques for visual processing, object recognition, emotional response, gesture production and recognition, etc., are necessary components of a complete humanoid robot. However, most projects invariably concentrate on developing a few of these individual components, neglecting the issue of how all of these pieces would eventually fit together. The focus of the work in this dissertation is on creating a framework into which such specific competencies can be embedded, in a way that they can interact with each other and build layers of new functionality. To be of any practical value, such a framework must satisfy the real-world constraints of functioning in real-time with noisy sensors and actuators. The humanoid robot Cog provides an unapologetically adequate platform from which to take on such a challenge. This work makes three contributions to embodied AI. First, it offers a general-purpose architecture for developing behavior-based systems distributed over networks of PC's. Second, it provides a motor-control system that simulates several biological features which impact the development of motor behavior. Third, it develops a framework for a system which enables a robot to learn new behaviors via interacting with itself and the outside world. A few basic functional modules are built into this framework, enough to demonstrate the robot learning some very simple behaviors taught by a human trainer. A primary motivation for this project is the notion that it is practically impossible to build an "intelligent" machine unless it is designed partly to build itself. This work is a proof-of-concept of such an approach to integrating multiple perceptual and motor systems into a complete learning agent.
AITR-2003-012 Author[s]: Andreas F. Wehowsky Safe Distributed Coordination of Heterogeneous Robots through Dynamic Simple Temporal Networks May 30, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-012.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-012.pdf Research on autonomous intelligent systems has focused on how robots can robustly carry out missions in uncertain and harsh environments with very little or no human intervention. Robotic execution languages such as RAPs, ESL, and TDL improve robustness by managing functionally redundant procedures for achieving goals. The model-based programming approach extends this by guaranteeing correctness of execution through pre-planning of non-deterministic timed threads of activities. Executing model-based programs effectively on distributed autonomous platforms requires distributing this pre-planning process. This thesis presents a distributed planner for modelbased programs whose planning and execution is distributed among agents with widely varying levels of processor power and memory resources. We make two key contributions. First, we reformulate a model-based program, which describes cooperative activities, into a hierarchical dynamic simple temporal network. This enables efficient distributed coordination of robots and supports deployment on heterogeneous robots. Second, we introduce a distributed temporal planner, called DTP, which solves hierarchical dynamic simple temporal networks with the assistance of the distributed Bellman- Ford shortest path algorithm. The implementation of DTP has been demonstrated successfully on a wide range of randomly generated examples and on a pursuer-evader challenge problem in simulation.
AIM-2003-012 Author[s]: Jacob Beal A Robust Amorphous Hierarchy from Persistent Nodes May 1, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-012.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-012.pdf For a very large network deployed in space with only nearby nodes able to talk to each other, we want to do tasks like robust routing and data storage. One way to organize the network is via a hierarchy, but hierarchies often have a few critical nodes whose death can disrupt organization over long distances. I address this with a system of distributed aggregates called Persistent Nodes, such that spatially local failures disrupt the hierarchy in an area proportional to the diameter of the failure. I describe and analyze this system, which has been implemented in simulation.
AITR-2003-011 Author[s]: Claire Monteleoni Online Learning of Non-stationary Sequences June 12, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-011.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-011.pdf We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them as simple Bayesian algorithms operating on models analogous to Hidden Markov Models (HMMs). We derive a new performance bound for such algorithms which is considerably simpler than existing bounds. The bound provides the basis for learning the rate at which the identity of the optimal expert switches over time. We find an analytic expression for the a priori resolution at which we need to learn the rate parameter. We extend our scalar switching-rate result to models of the switching-rate that are governed by a matrix of parameters, i.e. arbitrary homogeneous HMMs. We apply and examine our algorithm in the context of the problem of energy management in wireless networks. We analyze the new results in the framework of Information Theory.
AIM-2003-011 Author[s]: Jacob Beal Persistent Nodes for Reliable Memory in Geographically Local Networks April 15, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-011.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-011.pdf A Persistent Node is a redundant distributed mechanism for storing a key/value pair reliably in a geographically local network. In this paper, I develop a method of establishing Persistent Nodes in an amorphous matrix. I address issues of construction, usage, atomicity guarantees and reliability in the face of stopping failures. Applications include routing, congestion control, and data storage in gigascale networks.
AITR-2003-010 CBCL-228 Author[s]: Ezra Rosen Face Representation in Cortex: Studies Using a Simple and Not So Special Model June 5, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-010.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-010.pdf The face inversion effect has been widely documented as an effect of the uniqueness of face processing. Using a computational model, we show that the face inversion effect is a byproduct of expertise with respect to the face object class. In simulations using HMAX, a hierarchical, shape based model, we show that the magnitude of the inversion effect is a function of the specificity of the representation. Using many, sharply tuned units, an ``expert'' has a large inversion effect. On the other hand, if fewer, broadly tuned units are used, the expertise is lost, and this ``novice'' has a small inversion effect. As the size of the inversion effect is a product of the representation, not the object class, given the right training we can create experts and novices in any object class. Using the same representations as with faces, we create experts and novices for cars. We also measure the feasibility of a view-based model for recognition of rotated objects using HMAX. Using faces, we show that transfer of learning to novel views is possible. Given only one training view, the view-based model can recognize a face at a new orientation via interpolation from the views to which it had been tuned. Although the model can generalize well to upright faces, inverted faces yield poor performance because the features change differently under rotation.
AIM-2003-010 Author[s]: Chris Mario Christoudias, Louis-Philippe Morency and Trevor Darrell Light Field Morphable Models April 18, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-010.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-010.pdf Statistical shape and texture appearance models are powerful image representations, but previously had been restricted to 2D or simple 3D shapes. In this paper we present a novel 3D morphable model based on image-based rendering techniques, which can represent complex lighting conditions, structures, and surfaces. We describe how to construct a manifold of the multi-view appearance of an object class using light fields and show how to match a 2D image of an object to a point on this manifold. In turn we use the reconstructed light field to render novel views of the object. Our technique overcomes the limitations of polygon based appearance models and uses light fields that are acquired in real-time.
AITR-2003-009 CBCL-227 Author[s]: Jennifer Louie A Biological Model of Object Recognition with Feature Learning June 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-009.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-009.pdf Previous biological models of object recognition in cortex have been evaluated using idealized scenes and have hard-coded features, such as the HMAX model by Riesenhuber and Poggio [10]. Because HMAX uses the same set of features for all object classes, it does not perform well in the task of detecting a target object in clutter. This thesis presents a new model that integrates learning of object-specific features with the HMAX. The new model performs better than the standard HMAX and comparably to a computer vision system on face detection. Results from experimenting with unsupervised learning of features and the use of a biologically-plausible classifier are presented.
AIM-2003-009 Author[s]: Gregory Shakhnarovich, Paul Viola and Trevor Darrell Fast Pose Estimation with Parameter Sensitive Hashing April 18, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-009.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-009.pdf Example-based methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and high-dimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly becme prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends a recently developed method for locality-sensitive hashing, which finds approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions; we show how to find the set of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call Parameter-Sensitive Hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.
AITR-2003-008 Author[s]: Paul Fitzpatrick From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot June 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-008.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-008.pdf This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.
AIM-2003-008 Author[s]: Kristen Grauman, Gregory Shakhnarovich and Trevor Darrell Inferring 3D Structure with a Statistical Image-Based Shape Model April 17, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-008.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-008.pdf We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
AITR-2003-007 Author[s]: Kristen Grauman A Statistical Image-Based Shape Model for Visual Hull Reconstruction and 3D Structure Inference May 22, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-007.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-007.pdf We present a statistical image-based “shape + structure” model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
AIM-2003-007 Author[s]: Jacob Beal Leveraging Learning and Language Via Communication Bootstrapping March 17, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-007.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-007.pdf In a Communication Bootstrapping system, peer components with different perceptual worlds invent symbols and syntax based on correlations between their percepts. I propose that Communication Bootstrapping can also be used to acquire functional definitions of words and causal reasoning knowledge. I illustrate this point with several examples, then sketch the architecture of a system in progress which attempts to execute this task.
AITR-2003-006 Author[s]: Louis-Philippe Morency Stereo-Based Head Pose Tracking Using Iterative Closest Point and Normal Flow Constraint May 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-006.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-006.pdf In this text, we present two stereo-based head tracking techniques along with a fast 3D model acquisition system. The first tracking technique is a robust implementation of stereo-based head tracking designed for interactive environments with uncontrolled lighting. We integrate fast face detection and drift reduction algorithms with a gradient-based stereo rigid motion tracking technique. Our system can automatically segment and track a user's head under large rotation and illumination variations. Precision and usability of this approach are compared with previous tracking methods for cursor control and target selection in both desktop and interactive room environments. The second tracking technique is designed to improve the robustness of head pose tracking for fast movements. Our iterative hybrid tracker combines constraints from the ICP (Iterative Closest Point) algorithm and normal flow constraint. This new technique is more precise for small movements and noisy depth than ICP alone, and more robust for large movements than the normal flow constraint alone. We present experiments which test the accuracy of our approach on sequences of real and synthetic stereo images. The 3D model acquisition system we present quickly aligns intensity and depth images, and reconstructs a textured 3D mesh. 3D views are registered with shape alignment based on our iterative hybrid tracker. We reconstruct the 3D model using a new Cubic Ray Projection merging algorithm which takes advantage of a novel data structure: the linked voxel space. We present experiments to test the accuracy of our approach on 3D face modelling using real-time stereo images.
AIM-2003-006 Author[s]: Christine Alvarado, Jaime Teevan, Mark S. Ackerman and David Karger Surviving the Information Explosion: How People Find Their Electronic Information April 15, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-006.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-006.pdf We report on a study of how people look for information within email, files, and the Web. When locating a document or searching for a specific answer, people relied on their contextual knowledge of their information target to help them find it, often associating the target with a specific document. They appeared to prefer to use this contextual information as a guide in navigating locally in small steps to the desired document rather than directly jumping to their target. We found this behavior was especially true for people with unstructured information organization. We discuss the implications of our findings for the design of personal information management tools.
AIM-2003-005 Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman and Mark A. Rubin Context-Based Vision System for Place and Object Recognition March 19, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-005.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-005.pdf While navigating in an environment, a vision system has to be able to recognize where it is and what the main objects in the scene are. In this paper we present a context-based vision system for place and object recognition. The goal is to identify familiar locations (e.g., office 610, conference room 941, Main Street), to categorize new environments (office, corridor, street) and to use that information to provide contextual priors for object recognition (e.g., table, chair, car, computer). We present a low- dimensional global image representation that provides relevant information for place recognition and categorization, and how such contextual information introduces strong priors that simplify object recognition. We have trained the system to recognize over 60 locations (indoors and outdoors) and to suggest the presence and locations of more than 20 different object types. The algorithm has been integrated into a mobile system that provides real-time feedback to the user.
AITR-2003-005 CBCL-226 Author[s]: Sanmay Das Intelligent Market-Making in Artificial Financial Markets June 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-005.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-005.pdf This thesis describes and evaluates a market-making algorithm for setting prices in financial markets with asymmetric information, and analyzes the properties of artificial markets in which the algorithm is used. The core of our algorithm is a technique for maintaining an online probability density estimate of the underlying value of a stock. Previous theoretical work on market-making has led to price-setting equations for which solutions cannot be achieved in practice, whereas empirical work on algorithms for market-making has focused on sets of heuristics and rules that lack theoretical justification. The algorithm presented in this thesis is theoretically justified by results in finance, and at the same time flexible enough to be easily extended by incorporating modules for dealing with considerations like portfolio risk and competition from other market-makers. We analyze the performance of our algorithm experimentally in artificial markets with different parameter settings and find that many reasonable real-world properties emerge. For example, the spread increases in response to uncertainty about the true value of a stock, average spreads tend to be higher in more volatile markets, and market-makers with lower average spreads perform better in environments with multiple competitive market- makers. In addition, the time series data generated by simple markets populated with market-makers using our algorithm replicate properties of real-world financial time series, such as volatility clustering and the fat-tailed nature of return distributions, without the need to specify explicit models for opinion propagation and herd behavior in the trading crowd.
AIM-2003-004 CBCL-225 Author[s]: Izzat N. Jarudi and Pawan Sinha Relative Contributions of Internal and External Features to Face Recognition March 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-004.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-004.pdf The central challenge in face recognition lies in understanding the role different facial features play in our judgments of identity. Notable in this regard are the relative contributions of the internal (eyes, nose and mouth) and external (hair and jaw-line) features. Past studies that have investigated this issue have typically used high-resolution images or good-quality line drawings as facial stimuli. The results obtained are therefore most relevant for understanding the identification of faces at close range. However, given that real-world viewing conditions are rarely optimal, it is also important to know how image degradations, such as loss of resolution caused by large viewing distances, influence our ability to use internal and external features. Here, we report experiments designed to address this issue. Our data characterize how the relative contributions of internal and external features change as a function of image resolution. While we replicated results of previous studies that have shown internal features of familiar faces to be more useful for recognition than external features at high resolution, we found that the two feature sets reverse in importance as resolution decreases. These results suggest that the visual system uses a highly non-linear cue-fusion strategy in combining internal and external features along the dimension of image resolution and that the configural cues that relate the two feature sets play an important role in judgments of facial identity.
AITR-2003-004 Author[s]: Aaron D. Adler Segmentation and Alignment of Speech and Sketching in a Design Environment February 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-004.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-004.pdf Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.
AIM-2003-003 CBCL-224 Author[s]: Gadi Geiger, Tony Ezzat and Tomaso Poggio Perceptual Evaluation of Video-Realistic Speech February 28, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-003.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-003.pdf abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
AITR-2003-003 Author[s]: Leonid Peshkin Reinforcement Learning by Policy Search February 14, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-003.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-003.pdf One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
AIM-2003-002 Author[s]: Harald Steck annd Tommi S. Jaakkola (Semi-)Predictive Discretization During Model Selection February 25, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-002.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-002.pdf In this paper, we present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade- off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also independent of the metric used in the continuous space. Our experiments with gene expression data show that discretization plays a crucial role regarding the resulting network structure.
AITR-2003-002 Author[s]: Timothy Chklovski Using Analogy to Acquire Commonsense Knowledge from Human Contributors February 12, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-002.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-002.pdf The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical.
AIM-2003-001 Author[s]: Nathan Srebro and Tommi Jaakkola Generalized Low-Rank Approximations January 15, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-001.ps ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-001.pdf We study the frequent problem of approximating a target matrix with a matrix of lower rank. We provide a simple and efficient (EM) algorithm for solving {\em weighted} low rank approximation problems, which, unlike simple matrix factorization problems, do not admit a closed form solution in general. We analyze, in addition, the nature of locally optimal solutions that arise in this context, demonstrate the utility of accommodating the weights in reconstructing the underlying low rank representation, and extend the formulation to non-Gaussian noise models such as classification (collaborative filtering).
AITR-2003-001 Author[s]: Philip Mjong-Hyon Shin Kim Understanding Subsystems in Biology through Dimensionality Reduction, Graph Partitioning and Analytical Modeling February 5, 2003 ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-001.ps ftp://publications.ai.mit.edu/ai-publications/2003/AITR-2003-001.pdf Biological systems exhibit rich and complex behavior through the orchestrated interplay of a large array of components. It is hypothesized that separable subsystems with some degree of functional autonomy exist; deciphering their independent behavior and functionality would greatly facilitate understanding the system as a whole. Discovering and analyzing such subsystems are hence pivotal problems in the quest to gain a quantitative understanding of complex biological systems. In this work, using approaches from machine learning, physics and graph theory, methods for the identification and analysis of such subsystems were developed. A novel methodology, based on a recent machine learning algorithm known as non-negative matrix factorization (NMF), was developed to discover such subsystems in a set of large-scale gene expression data. This set of subsystems was then used to predict functional relationships between genes, and this approach was shown to score significantly higher than conventional methods when benchmarking them against existing databases. Moreover, a mathematical treatment was developed to treat simple network subsystems based only on their topology (independent of particular parameter values). Application to a problem of experimental interest demonstrated the need for extentions to the conventional model to fully explain the experimental data. Finally, the notion of a subsystem was evaluated from a topological perspective. A number of different protein networks were examined to analyze their topological properties with respect to separability, seeking to find separable subsystems. These networks were shown to exhibit separability in a nonintuitive fashion, while the separable subsystems were of strong biological significance. It was demonstrated that the separability property found was not due to incomplete or biased data, but is likely to reflect biological structure.
AIM-2002-024 CBCL-223 Author[s]: Sayan Mukherjee, Partha Niyogi, Tomaso Poggio and Ryan Rifkin Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization December 2002 (revised July 2003) ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-024.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-024.pdf Solutions of learning problems by Empirical Risk Minimization (ERM) need to be consistent, so that they may be predictive. They also need to be well- posed, so that they can be used robustly. We show that a statistical form of well-posedness, defined in terms of the key property of L-stability, is necessary and sufficient for consistency of ERM.
AIM-2002-023 CBCL-222 Author[s]: Luis Perez-Breva and Osamu Yoshimi Model Selection in Summary Evaluation December 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-023.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-023.pdf A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.
AIM-2002-022 Author[s]: Jake V. Bouvrie Multiple Resolution Image Classification December 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-022.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-022.pdf Binary image classifiction is a problem that has received much attention in recent years. In this paper we evaluate a selection of popular techniques in an effort to find a feature set/ classifier combination which generalizes well to full resolution image data. We then apply that system to images at one-half through one-sixteenth resolution, and consider the corresponding error rates. In addition, we further observe generalization performance as it depends on the number of training images, and lastly, compare the system's best error rates to that of a human performing an identical classification task given teh same set of test images.
AIM-2002-021 Author[s]: Jacob Beal Leaderless Distributed Hierarchy Formation December 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-021.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-021.pdf I present a system for robust leaderless organization of an amorphous network into hierarchical clusters. This system, which assumes that nodes are spatially embedded and can only talk to neighbors within a given radius, scales to networks of arbitrary size and converges rapidly. The amount of data stored at each node is logarithmic in the diameter of the network, and the hierarchical structure produces an addressing scheme such that there is an invertible relation between distance and address for any pair of nodes. The system adapts automatically to stopping failures, network partition, and reorganization.
AIM-2002-020 Author[s]: Erik B. Sudderth, Alexander T. Ihler, William T. Freeman and Alan S. Willsky Nonparametric Belief Propagation and Facial Appearance Estimation December 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-020.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-020.pdf In many applications of graphical models arising in computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. There exist inference algorithms for discrete approximations to these continuous distributions, but for the high-dimensional variables typically of interest, discrete inference becomes infeasible. Stochastic methods such as particle filters provide an appealing alternative. However, existing techniques fail to exploit the rich structure of the graphical models describing many vision problems. Drawing on ideas from regularized particle filters and belief propagation (BP), this paper develops a nonparametric belief propagation (NBP) algorithm applicable to general graphs. Each NBP iteration uses an efficient sampling procedure to update kernel-based approximations to the true, continuous likelihoods. The algorithm can accomodate an extremely broad class of potential functions, including nonparametric representations. Thus, NBP extends particle filtering methods to the more general vision problems that graphical models can describe. We apply the NBP algorithm to infer component interrelationships in a parts-based face model, allowing location and reconstruction of occluded features.
AIM-2002-019 Author[s]: Antonio Torralba and William T. Freeman Properties and Applications of Shape Recipes December 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-019.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-019.pdf In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself bear as much of the representational burden as possible. In many situations, scene and image are closely related and it is possible to find a functional relationship between them. The scene information can be represented in reference to the image where the functional specifies how to translate the image into the associated scene. We illustrate the use of this representation for encoding shape information. We show how this representation has appealing properties such as locality and slow variation across space and scale. These properties provide a way of improving shape estimates coming from other sources of information like stereo.
AIM-2002-018 Author[s]: Gerald Jay Sussman and Jack Wisdom The Role of Programming in the Formulation of Ideas November 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-018.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-018.pdf Classical mechanics is deceptively simple. It is surprisingly easy to get the right answer with fallacious reasoning or without real understanding. To address this problem we use computational techniques to communicate a deeper understanding of Classical Mechanics. Computational algorithms are used to express the methods used in the analysis of dynamical phenomena. Expressing the methods in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.
AIM-2002-017 Author[s]: Jack Wisdom Swimming in Space-Time November 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-017.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-017.pdf Cyclic changes in the shape of a quasi-rigid body on a curved manifold can lead to net translation and/or rotation of the body in the manifold. Presuming space-time is a curved manifold as portrayed by general relativity, translation in space can be accomplished simply by cyclic changes in the shape of a body, without any thrust or external forces.
AIM-2002-016 Author[s]: William T. Freeman and Antonio Torralba Shape Recipes: Scene Representations that Refer to the Image September 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-016.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-016.pdf The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (e.g., albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from bandpassed image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.
AIM-2002-015 Author[s]: Marshall F. Tappen, William T. Freeman and Edward H. Adelson Recovering Intrinsic Images from a Single Image September 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-015.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-015.pdf We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface's reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.
AIM-2002-014 Author[s]: Harald Steck and Tommi S. Jaakkola On the Dirichlet Prior and Bayesian Regularization September 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-014.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-014.pdf A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "trade-off" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.
AIM-2002-013 CBCL-220 Author[s]: M.A. Giese and X. Xie Exact Solution of the Nonlinear Dynamics of Recurrent Neural Mechanisms for Direction Selectivity August 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-013.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-013.pdf Different theoretical models have tried to investigate the feasibility of recurrent neural mechanisms for achieving direction selectivity in the visual cortex. The mathematical analysis of such models has been restricted so far to the case of purely linear networks. We present an exact analytical solution of the nonlinear dynamics of a class of direction selective recurrent neural models with threshold nonlinearity. Our mathematical analysis shows that such networks have form-stable stimulus-locked traveling pulse solutions that are appropriate for modeling the responses of direction selective cortical neurons. Our analysis shows also that the stability of such solutions can break down giving raise to a different class of solutions ("lurching activity waves") that are characterized by a specific spatio-temporal periodicity. These solutions cannot arise in models for direction selectivity with purely linear spatio-temporal filtering.
AIM-2002-012 CBCL-219 Author[s]: Martin Alexander Giese and Tomaso Poggio Biologically Plausible Neural Model for the Recognition of Biological Motion and Actions August 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-012.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-012.pdf The visual recognition of complex movements and actions is crucial for communication and survival in many species. Remarkable sensitivity and robustness of biological motion perception have been demonstrated in psychophysical experiments. In recent years, neurons and cortical areas involved in action recognition have been identified in neurophysiological and imaging studies. However, the detailed neural mechanisms that underlie the recognition of such complex movement patterns remain largely unknown. This paper reviews the experimental results and summarizes them in terms of a biologically plausible neural model. The model is based on the key assumption that action recognition is based on learned prototypical patterns and exploits information from the ventral and the dorsal pathway. The model makes specific predictions that motivate new experiments.
AITR-2002-011 Author[s]: J.P. Grossman Design and Evaluation of the Hamal Parallel Computer December 5, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-011.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-011.pdf Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.
AIM-2002-011 CBCL-218 Author[s]: Robert Schneider and Maximilian Riesenhuber A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition August 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-011.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-011.pdf The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view- tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.
AITR-2002-010 Author[s]: John M. Van Eepoel Achieving Real-Time Mode Estimation through Offline Compilation October 22, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-010.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-010.pdf As exploration of our solar system and outerspace move into the future, spacecraft are being developed to venture on increasingly challenging missions with bold objectives. The spacecraft tasked with completing these missions are becoming progressively more complex. This increases the potential for mission failure due to hardware malfunctions and unexpected spacecraft behavior. A solution to this problem lies in the development of an advanced fault management system. Fault management enables spacecraft to respond to failures and take repair actions so that it may continue its mission. The two main approaches developed for spacecraft fault management have been rule-based and model-based systems. Rules map sensor information to system behaviors, thus achieving fast response times, and making the actions of the fault management system explicit. These rules are developed by having a human reason through the interactions between spacecraft components. This process is limited by the number of interactions a human can reason about correctly. In the model-based approach, the human provides component models, and the fault management system reasons automatically about system wide interactions and complex fault combinations. This approach improves correctness, and makes explicit the underlying system models, whereas these are implicit in the rule- based approach. We propose a fault detection engine, Compiled Mode Estimation (CME) that unifies the strengths of the rule-based and model- based approaches. CME uses a compiled model to determine spacecraft behavior more accurately. Reasoning related to fault detection is compiled in an off-line process into a set of concurrent, localized diagnostic rules. These are then combined on-line along with sensor information to reconstruct the diagnosis of the system. These rules enable a human to inspect the diagnostic consequences of CME. Additionally, CME is capable of reasoning through component interactions automatically and still provide fast and correct responses. The implementation of this engine has been tested against the NEAR spacecraft advanced rule-based system, resulting in detection of failures beyond that of the rules. This evolution in fault detection will enable future missions to explore the furthest reaches of the solar system without the burden of human intervention to repair failed components.
AIM-2002-010 Author[s]: Justin Werfel Implementing Universal Computation in an Evolutionary System July 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-010.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-010.pdf Evolutionary algorithms are a common tool in engineering and in the study of natural evolution. Here we take their use in a new direction by showing how they can be made to implement a universal computer. We consider populations of individuals with genes whose values are the variables of interest. By allowing them to interact with one another in a specified environment with limited resources, we demonstrate the ability to construct any arbitrary logic circuit. We explore models based on the limits of small and large populations, and show examples of such a system in action, implementing a simple logic circuit.
AITR-2002-009 Author[s]: Ron O. Dror Surface Reflectance Recognition and Real-World Illumination Statistics October 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-009.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-009.pdf Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illuminination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image. Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance.
AIM-2002-009 CBCL-217 Author[s]: Adlar J. Kim and Christian R. Shelton Modeling Stock Order Flows and Learning Market-Making from Data June 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-009.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-009.pdf Stock markets employ specialized traders, market-makers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns profitable market-making strategies when run on this model. The sequence of buys and sells for a particular stock, the order flow, we model as an Input-Output Hidden Markov Model fit to historical data. When combined with the dynamics of the order book, this creates a highly non-linear and difficult dynamic system. Our reinforcement learning algorithm, based on likelihood ratios, is run on this partially-observable environment. We demonstrate learning results for two separate real stocks.
AITR-2002-008 CBCL-221 Author[s]: Vinay P. Kumar Towards Man-Machine Interfaces: Combining Top-down Constraints with Bottom-up Learning in Facial Analysis September 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-008.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-008.pdf This thesis proposes a methodology for the design of man-machine interfaces by combining top-down and bottom-up processes in vision. From a computational perspective, we propose that the scientific-cognitive question of combining top- down and bottom-up knowledge is similar to the engineering question of labeling a training set in a supervised learning problem. We investigate these questions in the realm of facial analysis. We propose the use of a linear morphable model (LMM) for representing top-down structure and use it to model various facial variations such as mouth shapes and expression, the pose of faces and visual speech (visemes). We apply a supervised learning method based on support vector machine (SVM) regression for estimating the parameters of LMMs directly from pixel-based representations of faces. We combine these methods for designing new, more self- contained systems for recognizing facial expressions, estimating facial pose and for recognizing visemes.
AIM-2002-008 Author[s]: Andrew "bunnie" Huang Keeping Secrets in Hardware: the Microsoft Xbox(TM) Case Study May 26, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-008.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-008.pdf This paper discusses the hardware foundations of the cryptosystem employed by the Xbox(TM) video game console from Microsoft. A secret boot block overlay is buried within a system ASIC. This secret boot block decrypts and verifies portions of an external FLASH-type ROM. The presence of the secret boot block is camouflaged by a decoy boot block in the external ROM. The code contained within the secret boot block is transferred to the CPU in the clear over a set of high-speed busses where it can be extracted using simple custom hardware. The paper concludes with recommendations for improving the Xbox security system. One lesson of this study is that the use of a high-performance bus alone is not a sufficient security measure, given the advent of inexpensive, fast rapid prototyping services and high-performance FPGAs.
AITR-2002-007 Author[s]: Carl Steinbach A Reinforcement-Learning Approach to Power Management May 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-007.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-007.pdf We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.
AIM-2002-007 CBCL-216 Author[s]: Ulf Knoblich, David J. Freedman and Maximilian Riesenhuber Categorization in IT and PFC: Model and Experiments April 18, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-007.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-007.pdf In a recent experiment, Freedman et al. recorded from inferotemporal (IT) and prefrontal cortices (PFC) of monkeys performing a "cat/dog" categorization task (Freedman 2001 and Freedman, Riesenhuber, Poggio, Miller 2001). In this paper we analyze the tuning properties of view-tuned units in our HMAX model of object recognition in cortex (Riesenhuber 1999) using the same paradigm and stimuli as in the experiment. We then compare the simulation results to the monkey inferotemporal neuron population data. We find that view-tuned model IT units that were trained without any explicit category information can show category-related tuning as observed in the experiment. This suggests that the tuning properties of experimental IT neurons might primarily be shaped by bottom-up stimulus-space statistics, with little influence of top-down task-specific information. The population of experimental PFC neurons, on the other hand, shows tuning properties that cannot be explained just by stimulus tuning. These analyses are compatible with a model of object recognition in cortex (Riesenhuber 2000) in which a population of shape-tuned neurons provides a general basis for neurons tuned to different recognition tasks.
AITR-2002-006 Author[s]: Andrew "bunnie" Huang ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction June 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-006.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-006.pdf The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
AIM-2002-006 Author[s]: Sarah Finney, Natalia H. Gardiol, Leslie Pack Kaelbling and Tim Oates Learning with Deictic Representation April 10, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-006.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-006.pdf Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.
AIM-2002-005 Author[s]: Gregory T. Sullivan Advanced Programming Language Features for Executable Design Patterns "Better Patterns Through Reflection March 22, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-005.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-005.pdf The Design Patterns book [GOF95] presents 24 time-tested patterns that consistently appear in well-designed software systems. Each pattern is presented with a description of the design problem the pattern addresses, as well as sample implementation code and design considerations. This paper explores how the patterns from the "Gang of Four'', or "GOF'' book, as it is often called, appear when similar problems are addressed using a dynamic, higher-order, object-oriented programming language. Some of the patterns disappear -- that is, they are supported directly by language features, some patterns are simpler or have a different focus, and some are essentially unchanged.
AITR-2002-005 Author[s]: Jeremy Hanford Brown Sparsely Faceted Arrays: A Mechanism Supporting Parallel Allocation, Communication, and Garbage Collection June 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-005.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-005.pdf Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low- level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.
AIM-2002-004 CBCL-215 Author[s]: Ulf Knoblich and Maximilan Riesenhuber Stimulus Simplification and Object Representation: A Modeling Study March 15, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-004.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-004.pdf Tsunoda et al. (2001) recently studied the nature of object representation in monkey inferotemporal cortex using a combination of optical imaging and extracellular recordings. In particular, they examined IT neuron responses to complex natural objects and "simplified" versions thereof. In that study, in 42% of the cases, optical imaging revealed a decrease in the number of activation patches in IT as stimuli were "simplified". However, in 58% of the cases, "simplification" of the stimuli actually led to the appearance of additional activation patches in IT. Based on these results, the authors propose a scheme in which an object is represented by combinations of active and inactive columns coding for individual features. We examine the patterns of activation caused by the same stimuli as used by Tsunoda et al. in our model of object recognition in cortex (Riesenhuber 99). We find that object-tuned units can show a pattern of appearance and disappearance of features identical to the experiment. Thus, the data of Tsunoda et al. appear to be in quantitative agreement with a simple object-based representation in which an object's identity is coded by its similarities to reference objects. Moreover, the agreement of simulations and experiment suggests that the simplification procedure used by Tsunoda (2001) is not necessarily an accurate method to determine neuronal tuning.
AITR-2002-004 Author[s]: Teodoro Arvizo III A Virtual Machine for a Type-omega Denotational Proof Language June 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-004.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-004.pdf In this thesis, I designed and implemented a virtual machine (VM) for a monomorphic variant of Athena, a type-omega denotational proof language (DPL). This machine attempts to maintain the minimum state required to evaluate Athena phrases. This thesis also includes the design and implementation of a compiler for monomorphic Athena that compiles to the VM. Finally, it includes details on my implementation of a read-eval-print loop that glues together the VM core and the compiler to provide a full, user-accessible interface to monomorphic Athena. The Athena VM provides the same basis for DPLs that the SECD machine does for pure, functional programming and the Warren Abstract Machine does for Prolog.
AITR-2002-003 Author[s]: Joanna J. Bryson Intelligence by Design: Principles of Modularity and Coordination for Engineerin September 2001 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-003.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-003.pdf All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This dissertation describes an approach, Behavior-Oriented Design (BOD) for engineering complex agents. A complex agent is one that must arbitrate between potentially conflicting goals or behaviors. Behavior-oriented design builds on work in behavior-based and hybrid architectures for agents, and the object oriented approach to software engineering. The primary contributions of this dissertation are: 1.The BOD architecture: a modular architecture with each module providing specialized representations to facilitate learning. This includes one pre-specified module and representation for action selection or behavior arbitration. The specialized representation underlying BOD action selection is Parallel-rooted, Ordered, Slip-stack Hierarchical (POSH) reactive plans. 2.The BOD development process: an iterative process that alternately scales the agent's capabilities then optimizes the agent for simplicity, exploiting tradeoffs between the component representations. This ongoing process for controlling complexity not only provides bias for the behaving agent, but also facilitates its maintenance and extendibility. The secondary contributions of this dissertation include two implementations of POSH action selection, a procedure for identifying useful idioms in agent architectures and using them to distribute knowledge across agent paradigms, several examples of applying BOD idioms to established architectures, an analysis and comparison of the attributes and design trends of a large number of agent architectures, a comparison of biological (particularly mammalian) intelligence to artificial agent architectures, a novel model of primate transitive inference, and many other examples of BOD agents and BOD development.
AIM-2002-003 CBCL-214 Author[s]: Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee and Alex Rakhlin Bagging Regularizes March 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-003.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-003.pdf Intuitively, we expect that averaging --- or bagging --- different regressors with low correlation should smooth their behavior and be somewhat similar to regularization. In this note we make this intuition precise. Using an almost classical definition of stability, we prove that a certain form of averaging provides generalization bounds with a rate of convergence of the same order as Tikhonov regularization --- similar to fashionable RKHS- based learning algorithms.
AITR-2002-002 Author[s]: Jacob Beal Generating Communications Systems Through Shared Context January 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-002.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-002.pdf In a distributed model of intelligence, peer components need to communicate with one another. I present a system which enables two agents connected by a thick twisted bundle of wires to bootstrap a simple communication system from observations of a shared environment. The agents learn a large vocabulary of symbols, as well as inflections on those symbols which allow thematic role-frames to be transmitted. Language acquisition time is rapid and linear in the number of symbols and inflections. The final communication system is robust and performance degrades gradually in the face of problems.
AIM-2002-002 Author[s]: William T. Freeman and Hao Zhang Shape-Time Photography January 10, 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-002.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-002.pdf We introduce a new method to describe, in a single image, changes in shape over time. We acquire both range and image information with a stationary stereo camera. From the pictures taken, we display a composite image consisting of the image data from the surface closest to the camera at every pixel. This reveals the 3-d relationships over time by easy-to-interpret occlusion relationships in the composite image. We call the composite a shape-time photograph. Small errors in depth measurements cause artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front surface, taking into account the depth measurements, their uncertainties, and layer continuity assumptions.
AIM-2002-001 Author[s]: Trevor Darrell, Neal Checka, Alice Oh and Louis-Philippe Morency Exploring Vision-Based Interfaces: How to Use Your Head in Dual Pointing Tasks January 2002 ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-001.ps ftp://publications.ai.mit.edu/ai-publications/2002/AIM-2002-001.pdf The utility of vision-based face tracking for dual pointing tasks is evaluated. We first describe a 3-D face tracking technique based on real-time parametric motion-stereo, which is non-invasive, robust, and self-initialized. The tracker provides a real-time estimate of a ?frontal face ray? whose intersection with the display surface plane is used as a second stream of input for scrolling or pointing, in paral-lel with hand input. We evaluated the performance of com-bined head/hand input on a box selection and coloring task: users selected boxes with one pointer and colors with a second pointer, or performed both tasks with a single pointer. We found that performance with head and one hand was intermediate between single hand performance and dual hand performance. Our results are consistent with previously reported dual hand conflict in symmetric pointing tasks, and suggest that a head-based input stream should be used for asymmetric control.
AITR-2002-001 Author[s]: Lilla Zollei 2D-3D Rigid-Body Registration of X-Ray Fluoroscopy and CT Images August 2001 ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-001.ps ftp://publications.ai.mit.edu/ai-publications/2002/AITR-2002-001.pdf The registration of pre-operative volumetric datasets to intra- operative two-dimensional images provides an improved way of verifying patient position and medical instrument loca- tion. In applications from orthopedics to neurosurgery, it has a great value in maintaining up-to-date information about changes due to intervention. We propose a mutual information- based registration algorithm to establish the proper align- ment. For optimization purposes, we compare the perfor- mance of the non-gradient Powell method and two slightly di erent versions of a stochastic gradient ascent strategy: one using a sparsely sampled histogramming approach and the other Parzen windowing to carry out probability density approximation. Our main contribution lies in adopting the stochastic ap- proximation scheme successfully applied in 3D-3D registra- tion problems to the 2D-3D scenario, which obviates the need for the generation of full DRRs at each iteration of pose op- timization. This facilitates a considerable savings in compu- tation expense. We also introduce a new probability density estimator for image intensities via sparse histogramming, de- rive gradient estimates for the density measures required by the maximization procedure and introduce the framework for a multiresolution strategy to the problem. Registration results are presented on uoroscopy and CT datasets of a plastic pelvis and a real skull, and on a high-resolution CT- derived simulated dataset of a real skull, a plastic skull, a plastic pelvis and a plastic lumbar spine segment.
AIM-2001-036 CBCL-213 Author[s]: Antonio Torralba and Aude Oliva Global Depth Perception from Familiar Scene Structure December 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-036.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-036.pdf In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual "size" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on specific objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.
AIM-2001-035 CBCL-212 Author[s]: Andrew Yip and Pawan Sinha Role of color in face recognition December 13, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-035.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-035.pdf One of the key challenges in face perception lies in determining the contribution of different cues to face identification. In this study, we focus on the role of color cues. Although color appears to be a salient attribute of faces, past research has suggested that it confers little recognition advantage for identifying people. Here we report experimental results suggesting that color cues do play a role in face recognition and their contribution becomes evident when shape cues are degraded. Under such conditions, recognition performance with color images is significantly better than that with grayscale images. Our experimental results also indicate that the contribution of color may lie not so much in providing diagnostic cues to identity as in aiding low-level image-analysis processes such as segmentation.
AIM-2001-034 CBCL-211 Author[s]: Maximilian Riesenhuber Generalization over contrast and mirror reversal, but not figure-ground reversal, in an "edge-based December 10, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-034.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-034.pdf Baylis & Driver (Nature Neuroscience, 2001) have recently presented data on the response of neurons in macaque inferotemporal cortex (IT) to various stimulus transformations. They report that neurons can generalize over contrast and mirror reversal, but not over figure-ground reversal. This finding is taken to demonstrate that ``the selectivity of IT neurons is not determined simply by the distinctive contours in a display, contrary to simple edge-based models of shape recognition'', citing our recently presented model of object recognition in cortex (Riesenhuber & Poggio, Nature Neuroscience, 1999). In this memo, I show that the main effects of the experiment can be obtained by performing the appropriate simulations in our simple feedforward model. This suggests for IT cell tuning that the possible contributions of explicit edge assignment processes postulated in (Baylis & Driver, 2001) might be smaller than expected.
AIM-2001-033 Author[s]: Ron O. Dror, Edward H. Adelson, and Alan S. Willsky Recognition of Surface Reflectance Properties from a Single Image under Unknown Real-World Illumination October 21, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-033.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-033.pdf This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.
AIM-2001-032 Author[s]: Roland W. Fleming, Ron O. Dror, Edward H. Adelson How do Humans Determine Reflectance Properties under Unknown Illumination? October 21, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-032.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-032.pdf Under normal viewing conditions, humans find it easy to distinguish between objects made out of different materials such as plastic, metal, or paper. Untextured materials such as these have different surface reflectance properties, including lightness and gloss. With single isolated images and unknown illumination conditions, the task of estimating surface reflectance is highly underconstrained, because many combinations of reflection and illumination are consistent with a given image. In order to work out how humans estimate surface reflectance properties, we asked subjects to match the appearance of isolated spheres taken out of their original contexts. We found that subjects were able to perform the task accurately and reliably without contextual information to specify the illumination. The spheres were rendered under a variety of artificial illuminations, such as a single point light source, and a number of photographically-captured real-world illuminations from both indoor and outdoor scenes. Subjects performed more accurately for stimuli viewed under real-world patterns of illumination than under artificial illuminations, suggesting that subjects use stored assumptions about the regularities of real-world illuminations to solve the ill-posed problem.
AIM-2001-031 Author[s]: Konstantine Arkoudas Simplifying transformations for type-alpha certificates November 13, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-031.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-031.pdf This paper presents an algorithm for simplifying NDL deductions. An array of simplifying transformations are rigorously defined. They are shown to be terminating, and to respect the formal semantis of the language. We also show that the transformations never increase the size or complexity of a deduction---in the worst case, they produce deductions of the same size and complexity as the original. We present several examples of proofs containing various types of "detours", and explain how our procedure eliminates them, resulting in smaller and cleaner deductions. All of the given transformations are fully implemented in SML-NJ. The complete code listing is presented, along with explanatory comments. Finally, although the transformations given here are defined for NDL, we point out that they can be applied to any type-alpha DPL that satisfies a few simple conditions.
AIM-2001-030 Author[s]: Adrian Corduneanu and Tommi Jaakkola Stable Mixing of Complete and Incomplete Information November 8, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-030.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-030.pdf An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems.
AIM-2001-029 CBCL-209 Author[s]: Yuri Ostrovsky, Patrick Cavanagh and Pawan Sinha Perceiving Illumination Inconsistencies in Scenes November 5, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-029.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-029.pdf The human visual system is adept at detecting and encoding statistical regularities in its spatio-temporal environment. Here we report an unexpected failure of this ability in the context of perceiving inconsistencies in illumination distributions across a scene. Contrary to predictions from previous studies [Enns and Rensink, 1990; Sun and Perona, 1996a, 1996b, 1997], we find that the visual system displays a remarkable lack of sensitivity to illumination inconsistencies, both in experimental stimuli and in images of real scenes. Our results allow us to draw inferences regarding how the visual system encodes illumination distributions across scenes. Specifically, they suggest that the visual system does not verify the global consistency of locally derived estimates of illumination direction.
AIM-2001-028 CBCL-208 Author[s]: Antonio Torralba and Pawan Sinha Detecting Faces in Impoverished Images November 5, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-028.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-028.pdf The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.
AIM-2001-027 Author[s]: Konstantine Arkoudas Type-omega DPLs October 16, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-027.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-027.pdf Type-omega DPLs (Denotational Proof Languages) are languages for proof presentation and search that offer strong soundness guarantees. LCF-type systems such as HOL offer similar guarantees, but their soundness relies heavily on static type systems. By contrast, DPLs ensure soundness dynamically, through their evaluation semantics; no type system is necessary. This is possible owing to a novel two-tier syntax that separates deductions from computations, and to the abstraction of assumption bases, which is factored into the semantics of the language and allows for sound evaluation. Every type-omega DPL properly contains a type-alpha DPL, which can be used to present proofs in a lucid and detailed form, exclusively in terms of primitive inference rules. Derived inference rules are expressed as user-defined methods, which are "proof recipes" that take arguments and dynamically perform appropriate deductions. Methods arise naturally via parametric abstraction over type-alpha proofs. In that light, the evaluation of a method call can be viewed as a computation that carries out a type-alpha deduction. The type-alpha proof "unwound" by such a method call is called the "certificate" of the call. Certificates can be checked by exceptionally simple type-alpha interpreters, and thus they are useful whenever we wish to minimize our trusted base. Methods are statically closed over lexical environments, but dynamically scoped over assumption bases. They can take other methods as arguments, they can iterate, and they can branch conditionally. These capabilities, in tandem with the bifurcated syntax of type-omega DPLs and their dynamic assumption-base semantics, allow the user to define methods in a style that is disciplined enough to ensure soundness yet fluid enough to permit succinct and perspicuous expression of arbitrarily sophisticated derived inference rules. We demonstrate every major feature of type-omega DPLs by defining and studying NDL-omega, a higher-order, lexically scoped, call-by-value type-omega DPL for classical zero-order natural deduction---a simple choice that allows us to focus on type-omega syntax and semantics rather than on the subtleties of the underlying logic. We start by illustrating how type-alpha DPLs naturally lead to type-omega DPLs by way of abstraction; present the formal syntax and semantics of NDL-omega; prove several results about it, including soundness; give numerous examples of methods; point out connections to the lambda-phi calculus, a very general framework for type-omega DPLs; introduce a notion of computational and deductive cost; define several instrumented interpreters for computing such costs and for generating certificates; explore the use of type-omega DPLs as general programming languages; show that DPLs do not have to be type-less by formulating a static Hindley-Milner polymorphic type system for NDL-omega; discuss some idiosyncrasies of type-omega DPLs such as the potential divergence of proof checking; and compare type-omega DPLs to other approaches to proof presentation and discovery. Finally, a complete implementation of NDL-omega in SML-NJ is given for users who want to run the examples and experiment with the language.
AIM-2001-026 CBCL-210 Author[s]: Jason D. M. Rennie and Ryan Rifkin Improving Multiclass Text Classification with the Support Vector Machine October 16, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-026.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-026.pdf We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.
AIM-2001-025 Author[s]: Konstantine Arkoudas Type-alpha DPLs October 5, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-025.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-025.pdf This paper introduces Denotational Proof Languages (DPLs). DPLs are languages for presenting, discovering, and checking formal proofs. In particular, in this paper we discus type-alpha DPLs---a simple class of DPLs for which termination is guaranteed and proof checking can be performed in time linear in the size of the proof. Type-alpha DPLs allow for lucid proof presentation and for efficient proof checking, but not for proof search. Type-omega DPLs allow for search as well as simple presentation and checking, but termination is no longer guaranteed and proof checking may diverge. We do not study type-omega DPLs here. We start by listing some common characteristics of DPLs. We then illustrate with a particularly simple example: a toy type-alpha DPL called PAR, for deducing parities. We present the abstract syntax of PAR, followed by two different kinds of formal semantics: evaluation and denotational. We then relate the two semantics and show how proof checking becomes tantamount to evaluation. We proceed to develop the proof theory of PAR, formulating and studying certain key notions such as observational equivalence that pervade all DPLs. We then present NDL, a type-alpha DPL for classical zero-order natural deduction. Our presentation of NDL mirrors that of PAR, showing how every basic concept that was introduced in PAR resurfaces in NDL. We present sample proofs of several well-known tautologies of propositional logic that demonstrate our thesis that DPL proofs are readable, writable, and concise. Next we contrast DPLs to typed logics based on the Curry-Howard isomorphism, and discuss the distinction between pure and augmented DPLs. Finally we consider the issue of implementing DPLs, presenting an implementation of PAR in SML and one in Athena, and end with some concluding remarks.
AIM-2001-024 Author[s]: Leonid Taycher and Trevor Darrell Range Segmentation Using Visibility Constraints September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-024.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-024.pdf Visibility constraints can aid the segmentation of foreground objects observed with multiple range images. In our approach, points are defined as foreground if they can be determined to occlude some {em empty space} in the scene. We present an efficient algorithm to estimate foreground points in each range view using explicit epipolar search. In cases where the background pattern is stationary, we show how visibility constraints from other views can generate virtual background values at points with no valid depth in the primary view. We demonstrate the performance of both algorithms for detecting people in indoor office environments.
AIM-2001-023 Author[s]: Ron O. Dror, Edward H. Adelson and Alan S. Willsky Surface Reflectance Estimation and Natural Illumination Statistics September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-023.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-023.pdf Humans recognize optical reflectance properties of surfaces such as metal, plastic, or paper from a single image without knowledge of illumination. We develop a machine vision system to perform similar recognition tasks automatically. Reflectance estimation under unknown, arbitrary illumination proves highly underconstrained due to the variety of potential illumination distributions and surface reflectance properties. We have found that the spatial structure of real-world illumination possesses some of the statistical regularities observed in the natural image statistics literature. A human or computer vision system may be able to exploit this prior information to determine the most likely surface reflectance given an observed image. We develop an algorithm for reflectance classification under unknown real-world illumination, which learns relationships between surface reflectance and certain features (statistics) computed from a single observed image. We also develop an automatic feature selection method.
AIM-2001-022 CBCL-207 Author[s]: Angela J. Yu, Martin A. Giese and Tomaso A. Poggio Biologically Plausible Neural Circuits for Realization of Maximum Operations September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-022.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-022.pdf Object recognition in the visual cortex is based on a hierarchical architecture, in which specialized brain regions along the ventral pathway extract object features of increasing levels of complexity, accompanied by greater invariance in stimulus size, position, and orientation. Recent theoretical studies postulate a non-linear pooling function, such as the maximum (MAX) operation could be fundamental in achieving such invariance. In this paper, we are concerned with neurally plausible mechanisms that may be involved in realizing the MAX operation. Four canonical circuits are proposed, each based on neural mechanisms that have been previously discussed in the context of cortical processing. Through simulations and mathematical analysis, we examine the relative performance and robustness of these mechanisms. We derive experimentally verifiable predictions for each circuit and discuss their respective physiological considerations.
AIM-2001-021 Author[s]: Erik G. Miller, Kinh Tieu and Chris P. Stauffer Learning Object-Independent Modes of Variation with Feature Flow Fields September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-021.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-021.pdf We present a unifying framework in which "object-independent" modes of variation are learned from continuous-time data such as video sequences. These modes of variation can be used as "generators" to produce a manifold of images of a new object from a single example of that object. We develop the framework in the context of a well-known example: analyzing the modes of spatial deformations of a scene under camera movement. Our method learns a close approximation to the standard affine deformations that are expected from the geometry of the situation, and does so in a completely unsupervised (i.e. ignorant of the geometry of the situation) fashion. We stress that it is learning a "parameterization", not just the parameter values, of the data. We then demonstrate how we have used the same framework to derive a novel data-driven model of joint color change in images due to common lighting variations. The model is superior to previous models of color change in describing non-linear color changes due to lighting.
AIM-2001-020 CBCL-205 Author[s]: Antonio Torralba and Pawan Sinha Contextual Priming for Object Detection September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-020.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-020.pdf There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
AIM-2001-019 Author[s]: Lily Lee Gait Dynamics for Recognition and Classification September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-019.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-019.pdf This paper describes a representation of the dynamics of human walking action for the purpose of person identification and classification by gait appearance. Our gait representation is based on simple features such as moments extracted from video silhouettes of human walking motion. We claim that our gait dynamics representation is rich enough for the task of recognition and classification. The use of our feature representation is demonstrated in the task of person recognition from video sequences of orthogonal views of people walking. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times, and under varying lighting environments. In addition, preliminary results are shown on gender classification using our gait dynamics features.
AIM-2001-018 CBCL-206 Author[s]: Gene Yeo, Tomaso Poggio Multiclass Classification of SRBCTs August 25, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-018.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-018.pdf A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.
AIM-2001-017 CBCL-203 Author[s]: Pawan Sinha and Antonio Torralba Role of Low-level Mechanisms in Brightness Perception August 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-017.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-017.pdf Brightness judgments are a key part of the primate brain’s visual analysis of the environment. There is general consensus that the perceived brightness of an image region is based not only on its actual luminance, but also on the photometric structure of its neighborhood. However, it is unclear precisely how a region’s context influences its perceived brightness. Recent research has suggested that brightness estimation may be based on a sophisticated analysis of scene layout in terms of transparency, illumination and shadows. This work has called into question the role of low-level mechanisms, such as lateral inhibition, as explanations for brightness phenomena. Here we describe experiments with displays for which low-level and high-level analyses make qualitatively different predictions, and with which we can quantitatively assess the trade-offs between low-level and high-level factors. We find that brightness percepts in these displays are governed by low-level stimulus properties, even when these percepts are inconsistent with higher-level interpretations of scene layout. These results point to the important role of low-level mechanisms in determining brightness percepts.
AIM-2001-016 Author[s]: Jacob Beal An Algorithm for Bootstrapping Communications August 13, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-016.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-016.pdf I present an algorithm which allows two agents to generate a simple language based only on observations of a shared environment. Vocabulary and roles for the language are learned in linear time. Communication is robust and degrades gradually as complexity increases. Dissimilar modes of experience will lead to a shared kernel vocabulary.
AIM-2001-015 CBCL-202 Author[s]: Antonio Torralba, Pawan Sinha Recognizing Indoor Scenes July 25, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-015.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-015.pdf We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-resolution spatial and spectral information. The holistic nature of the representation dispenses with the need to rely on specific objects or local landmarks and also renders it robust against variations in object configurations. We demonstrate the scheme on the problem of recognizing scenes in video sequences captured while walking through an office environment. We develop a method for distinguishing between 'diagnostic' and 'generic' views and also evaluate changes in system performances as a function of the amount of training data available and the complexity of the representation.
AIM-2001-014 CBCL-201 Author[s]: Richard Russell and Pawan Sinha Perceptually-based Comparison of Image Similarity Metrics July 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-014.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-014.pdf The image comparison operation – assessing how well one image matches another – forms a critical component of many image analysis systems and models of human visual processing. Two norms used commonly for this purpose are L1 and L2, which are specific instances of the Minkowski metric. However, there is often not a principled reason for selecting one norm over the other. One way to address this problem is by examining whether one metric better captures the perceptual notion of image similarity than the other. With this goal, we examined perceptual preferences for images retrieved on the basis of the L1 versus the L2 norm. These images were either small fragments without recognizable content, or larger patterns with recognizable content created via vector quantization. In both conditions the subjects showed a consistent preference for images matched using the L1 metric. These results suggest that, in the domain of natural images of the kind we have used, the L1 metric may better capture human notions of image similarity.
AIM-2001-013 CBCL-200 Author[s]: Nicholas T. Chan, Ely Dahan, Andrew W. Lo and Tomaso Poggio Experimental Markets for Product Concepts July 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-013.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-013.pdf Market prices are well known to efficiently collect and aggregate diverse information regarding the value of commodities and assets. The role of markets has been particularly suitable to pricing financial securities. This article provides an alternative application of the pricing mechanism to marketing research - using pseudo-securities markets to measure preferences over new product concepts. Surveys, focus groups, concept tests and conjoint studies are methods traditionally used to measure individual and aggregate preferences. Unfortunately, these methods can be biased, costly and time-consuming to conduct. The present research is motivated by the desire to efficiently measure preferences and more accurately predict new product success, based on the efficiency and incentive-compatibility of security trading markets. The article describes a novel market research method, pro-vides insight into why the method should work, and compares the results of several trading experiments against other methodologies such as concept testing and conjoint analysis.
AIM-2001-012 CBCL-199 Author[s]: Mariano Alvira, Jim Paris and Ryan Rifkin The Audiomomma Music Recommendation System July 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-012.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-012.pdf We design and implement a system that recommends musicians to listeners. The basic idea is to keep track of what artists a user listens to, to find other users with similar tastes, and to recommend other artists that these similar listeners enjoy. The system utilizes a client-server architecture, a web-based interface, and an SQL database to store and process information. We describe Audiomomma-0.3, a proof-of-concept implementation of the above ideas.
AIM-2001-011 CBCL-198 Author[s]: T. Poggio, S. Mukherjee, R. Rifkin, A. Rakhlin, and A. Verri b July 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-011.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-011.pdf In this note we characterize the role of b ,which is the constant in the standard form of the solution provided by the Support Vector Machine technique f (x )= i =1 • i K (x ,x i )+b .
AIM-2001-010 CBCL-197 Author[s]: Purdy Ho Rotation Invariant Real-time Face Detection and Recognition System May 31, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-010.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-010.pdf In this report, a face recognition system that is capable of detecting and recognizing frontal and rotated faces was developed. Two face recognition methods focusing on the aspect of pose invariance are presented and evaluated - the whole face approach and the component-based approach. The main challenge of this project is to develop a system that is able to identify faces under different viewing angles in realtime. The development of such a system will enhance the capability and robustness of current face recognition technology. The whole-face approach recognizes faces by classifying a single feature vector consisting of the gray values of the whole face image. The component-based approach first locates the facial components and extracts them. These components are normalized and combined into a single feature vector for classification. The Support Vector Machine (SVM) is used as the classifier for both approaches. Extensive tests with respect to the robustness against pose changes are performed on a database that includes faces rotated up to about 40 degrees in depth. The component-based approach clearly outperforms the whole-face approach on all tests. Although this approach isproven to be more reliable, it is still too slow for real-time applications. That is the reason why a real-time face recognition system using the whole-face approach is implemented to recognize people in color video sequences.
AIM-2001-009 Author[s]: D. Demirdjian and T. Darrell Motion Estimation from Disparity Images May 7, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-009.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-009.pdf A new method for 3D rigid motion estimation from stereo is proposed in this paper. The appealing feature of this method is that it directly uses the disparity images obtained from stereo matching. We assume that the stereo rig has parallel cameras and show, in that case, the geometric and topological properties of the disparity images. Then we introduce a rigid transformation (called d-motion) that maps two disparity images of a rigidly moving object. We show how it is related to the Euclidean rigid motion and a motion estimation algorithm is derived. We show with experiments that our approach is simple and more accurate than standard approaches.
AITR-2001-009 Author[s]: Tevfik Metin Sezgin Feature Point Detection and Curve Approximation for Early Processing of Freehand Sketches May 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-009.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-009.pdf Freehand sketching is both a natural and crucial part of design, yet is unsupported by current design automation software. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer to produce a design environment that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in the sketch into the intended geometric objects using feature point detection and approximation. We demonstrate how multiple sources of information can be combined for feature detection in strokes and apply this technique using two approaches to signal processing, one using simple average based thresholding and a second using scale space.
AIM-2001-008 Author[s]: A. Rahimi, L.-P. Morency and T. Darrell Reducing Drift in Parametric Motion Tracking May 7, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-008.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-008.pdf We develop a class of differential motion trackers that automatically stabilize when in finite domains. Most differ-ential trackers compute motion only relative to one previous frame, accumulating errors indefinitely. We estimate pose changes between a set of past frames, and develop a probabilistic framework for integrating those estimates. We use an approximation to the posterior distribution of pose changes as an uncertainty model for parametric motion in order to help arbitrate the use of multiple base frames. We demonstrate this framework on a simple 2D translational tracker and a 3D, 6-degree of freedom tracker.
AITR-2001-008 Author[s]: Radhika Nagpal Programmable Self-Assembly: Constructing Global Shape using Biologically-inspire June 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-008.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-008.pdf In this thesis I present a language for instructing a sheet of identically-programmed, flexible, autonomous agents (``cells'') to assemble themselves into a predetermined global shape, using local interactions. The global shape is described as a folding construction on a continuous sheet, using a set of axioms from paper-folding (origami). I provide a means of automatically deriving the cell program, executed by all cells, from the global shape description. With this language, a wide variety of global shapes and patterns can be synthesized, using only local interactions between identically-programmed cells. Examples include flat layered shapes, all plane Euclidean constructions, and a variety of tessellation patterns. In contrast to approaches based on cellular automata or evolution, the cell program is directly derived from the global shape description and is composed from a small number of biologically-inspired primitives: gradients, neighborhood query, polarity inversion, cell-to-cell contact and flexible folding. The cell programs are robust, without relying on regular cell placement, global coordinates, or synchronous operation and can tolerate a small amount of random cell death. I show that an average cell neighborhood of 15 is sufficient to reliably self-assemble complex shapes and geometric patterns on randomly distributed cells. The language provides many insights into the relationship between local and global descriptions of behavior, such as the advantage of constructive languages, mechanisms for achieving global robustness, and mechanisms for achieving scale- independent shapes from a single cell program. The language suggests a mechanism by which many related shapes can be created by the same cell program, in the manner of D'Arcy Thompson's famous coordinate transformations. The thesis illuminates how complex morphology and pattern can emerge from local interactions, and how one can engineer robust self-assembly.
AITR-2001-007 Author[s]: Won Hong Modeling, Estimation, and Control of Robot-Soil Interactions September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-007.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-007.pdf This thesis presents the development of hardware, theory, and experimental methods to enable a robotic manipulator arm to interact with soils and estimate soil properties from interaction forces. Unlike the majority of robotic systems interacting with soil, our objective is parameter estimation, not excavation. To this end, we design our manipulator with a flat plate for easy modeling of interactions. By using a flat plate, we take advantage of the wealth of research on the similar problem of earth pressure on retaining walls. There are a number of existing earth pressure models. These models typically provide estimates of force which are in uncertain relation to the true force. A recent technique, known as numerical limit analysis, provides upper and lower bounds on the true force. Predictions from the numerical limit analysis technique are shown to be in good agreement with other accepted models. Experimental methods for plate insertion, soil-tool interface friction estimation, and control of applied forces on the soil are presented. In addition, a novel graphical technique for inverting the soil models is developed, which is an improvement over standard nonlinear optimization. This graphical technique utilizes the uncertainties associated with each set of force measurements to obtain all possible parameters which could have produced the measured forces. The system is tested on three cohesionless soils, two in a loose state and one in a loose and dense state. The results are compared with friction angles obtained from direct shear tests. The results highlight a number of key points. Common assumptions are made in soil modeling. Most notably, the Mohr-Coulomb failure law and perfectly plastic behavior. In the direct shear tests, a marked dependence of friction angle on the normal stress at low stresses is found. This has ramifications for any study of friction done at low stresses. In addition, gradual failures are often observed for vertical tools and tools inclined away from the direction of motion. After accounting for the change in friction angle at low stresses, the results show good agreement with the direct shear values.
AIM-2001-007 Author[s]: Konstantine Arkoudas Certified Computation April 30, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-007.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-007.pdf This paper introduces the notion of certified computation. A certified computation does not only produce a result r, but also a correctness certificate, which is a formal proof that r is correct. This can greatly enhance the credibility of the result: if we trust the axioms and inference rules that are used in the certificate,then we can be assured that r is correct. In effect,we obtain a trust reduction: we no longer have to trust the entire computation; we only have to trust the certificate. Typically, the reasoning used in the certificate is much simpler and easier to trust than the entire computation. Certified computation has two main applications: as a software engineering discipline, it can be used to increase the reliability of our code; and as a framework for cooperative computation, it can be used whenever a code consumer executes an algorithm obtained from an untrusted agent and needs to be convinced that the generated results are correct. We propose DPLs (Denotational Proof Languages)as a uniform platform for certified computation. DPLs enforce a sharp separation between logic and control and over versatile mechanicms for constructing certificates. We use Athena as a concrete DPL to illustrate our ideas, and we present two examples of certified computation, giving full working code in both cases.
AITR-2001-006 Author[s]: Aaron Mark Ucko Predicate Dispatching in the Common Lisp Object System May 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-006.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-006.pdf I have added support for predicate dispatching, a powerful generalization of other dispatching mechanisms, to the Common Lisp Object System (CLOS). To demonstrate its utility, I used predicate dispatching to enhance Weyl, a computer algebra system which doubles as a CLOS library. My result is Dispatching-Enhanced Weyl (DEW), a computer algebra system that I have demonstrated to be well suited for both users and programmers.
AIM-2001-006 CBCL-196 Author[s]: Javid Sadr and Pawan Sinha Exploring Object Perception with Random Image Structure Evolution March 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-006.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-006.pdf We have developed a technique called RISE (Random Image Structure Evolution), by which one may systematically sample continuous paths in a high-dimensional image space. A basic RISE sequence depicts the evolution of an object's image from a random field, along with the reverse sequence which depicts the transformation of this image back into randomness. The processing steps are designed to ensure that important low-level image attributes such as the frequency spectrum and luminance are held constant throughout a RISE sequence. Experiments based on the RISE paradigm can be used to address some key open issues in object perception. These include determining the neural substrates underlying object perception, the role of prior knowledge and expectation in object perception, and the developmental changes in object perception skills from infancy to adulthood.
AITR-2001-005 Author[s]: Jessica Banks Design and Control of an Anthropomorphic Robotic Finger with Multi-point Tactile Sensation May 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-005.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-005.pdf The goal of this research is to develop the prototype of a tactile sensing platform for anthropomorphic manipulation research. We investigate this problem through the fabrication and simple control of a planar 2-DOF robotic finger inspired by anatomic consistency, self-containment, and adaptability. The robot is equipped with a tactile sensor array based on optical transducer technology whereby localized changes in light intensity within an illuminated foam substrate correspond to the distribution and magnitude of forces applied to the sensor surface plane. The integration of tactile perception is a key component in realizing robotic systems which organically interact with the world. Such natural behavior is characterized by compliant performance that can initiate internal, and respond to external, force application in a dynamic environment. However, most of the current manipulators that support some form of haptic feedback either solely derive proprioceptive sensation or only limit tactile sensors to the mechanical fingertips. These constraints are due to the technological challenges involved in high resolution, multi-point tactile perception. In this work, however, we take the opposite approach, emphasizing the role of full-finger tactile feedback in the refinement of manual capabilities. To this end, we propose and implement a control framework for sensorimotor coordination analogous to infant-level grasping and fixturing reflexes. This thesis details the mechanisms used to achieve these sensory, actuation, and control objectives, along with the design philosophies and biological influences behind them. The results of behavioral experiments with a simple tactilely-modulated control scheme are also described. The hope is to integrate the modular finger into an %engineered analog of the human hand with a complete haptic system.
AIM-2001-005 CBCL-195 Author[s]: Nicholas Tung Chan and Christian Shelton An Electronic Market-Maker April 17, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-005.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-005.pdf This paper presents an adaptive learning model for market-making under the reinforcement learning framework. Reinforcement learning is a learning technique in which agents aim to maximize the long-term accumulated rewards. No knowledge of the market environment, such as the order arrival or price process, is assumed. Instead, the agent learns from real-time market experience and develops explicit market-making strategies, achieving multiple objectives including the maximizing of profits and minimization of the bid-ask spread. The simulation results show initial success in bringing learning techniques to building market-making algorithms.
AITR-2001-004 Author[s]: Jason D. M. Rennie Improving Multi-class Text Classification with Naive Bayes September 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-004.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-004.pdf There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
AIM-2001-004 CBCL-193 Author[s]: Mariano Alvira and Ryan Rifkin An Empirical Comparison of SNoW and SVMs for Face Detection January 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-004.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-004.pdf Impressive claims have been made for the performance of the SNoW algorithm on face detection tasks by Yang et. al. [7]. In particular, by looking at both their results and those of Heisele et. al. [3], one could infer that the SNoW system performed substantially better than an SVM-based system, even when the SVM used a polynomial kernel and the SNoW system used a particularly simplistic 'primitive' linear representation. We evaluated the two approaches in a controlled experiment, looking directly at performance on a simple, fixed-sized test set, isolating out 'infrastructure' issues related to detecting faces at various scales in large images. We found that SNoW performed about as well as linear SVMs, and substantially worse than polynomial SVMs.
AITR-2001-003 CBCL-204 Author[s]: Christian Robert Shelton Importance Sampling for Reinforcement Learning with Multiple Objectives August 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-003.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-003.pdf This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms. We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm. Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making.
AIM-2001-003 Author[s]: Nicolas Meuleau, Leonid Peshkin and Kee-Eung Kim Exploration in Gradient-Based Reinforcement Learning April 3, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-003.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-003.pdf Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements.
AITR-2001-002 Author[s]: Pedro F. Felzenszwalb Object Recognition with Pictorial Structures May 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-002.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-002.pdf This thesis presents a statistical framework for object recognition. The framework is motivated by the pictorial structure models introduced by Fischler and Elschlager nearly 30 years ago. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. The problem of detecting an object in an image and the problem of learning an object model using training examples are naturally formulated under a statistical approach. We present efficient algorithms to solve these problems in our framework. We demonstrate our techniques by training models to represent faces and human bodies. The models are then used to locate the corresponding objects in novel images.
AIM-2001-002 CBCL-194 Author[s]: Christian R. Shelton Policy Improvement for POMDPs Using Normalized Importance Sampling March 20, 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-002.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-002.pdf We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.
AITR-2001-001 Author[s]: Kimberle Koile The Architect's Collaborator: Toward Intelligent Tools for Conceptual Design January 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-001.ps ftp://publications.ai.mit.edu/ai-publications/2001/AITR-2001-001.pdf In early stages of architectural design, as in other design domains, the language used is often very abstract. In architectural design, for example, architects and their clients use experiential terms such as "private" or "open" to describe spaces. If we are to build programs that can help designers during this early-stage design, we must give those programs the capability to deal with concepts on the level of such abstractions. The work reported in this thesis sought to do that, focusing on two key questions: How are abstract terms such as "private" and "open" translated into physical form? How might one build a tool to assist designers with this process? The Architect's Collaborator (TAC) was built to explore these issues. It is a design assistant that supports iterative design refinement, and that represents and reasons about how experiential qualities are manifested in physical form. Given a starting design and a set of design goals, TAC explores the space of possible designs in search of solutions that satisfy the goals. It employs a strategy we've called dependency-directed redesign: it evaluates a design with respect to a set of goals, then uses an explanation of the evaluation to guide proposal and refinement of repair suggestions; it then carries out the repair suggestions to create new designs. A series of experiments was run to study TAC's behavior. Issues of control structure, goal set size, goal order, and modification operator capabilities were explored. In addition, TAC's use as a design assistant was studied in an experiment using a house in the process of being redesigned. TAC's use as an analysis tool was studied in an experiment using Frank Lloyd Wright's Prairie houses.
AIM-2001-001 Author[s]: T. Darrell, D. Demirdjian, N. Checka and P. Felzenswalb Plan-view Trajectory Estimation with Dense Stereo Background Models February 2001 ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-001.ps ftp://publications.ai.mit.edu/ai-publications/2001/AIM-2001-001.pdf In a known environment, objects may be tracked in multiple views using a set of back-ground models. Stereo-based models can be illumination-invariant, but often have undefined values which inevitably lead to foreground classification errors. We derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions. Foreground points are detected quickly in new images using pruned disparity search. We adopt a 'late-segmentation' strategy, using an integrated plan-view density representation. Foreground points are segmented into object regions only when a trajectory is finally estimated, using a dynamic programming-based method. Object entry and exit are optimally determined and are not restricted to special spatial zones.
AIM-1697 CBCL-192 Author[s]: Thomas Serre, Bernd Heisele, Sayan Mukherjee and Tomaso Poggio Feature Selection for Face Detection September 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1697.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1697.pdf We present a new method to select features for a face detection system using Support Vector Machines (SVMs). In the first step we reduce the dimensionality of the input space by projecting the data into a subset of eigenvectors. The dimension of the subset is determined by a classification criterion based on minimizing a bound on the expected error probability of an SVM. In the second step we select features from the SVM feature space by removing those that have low contributions to the decision function of the SVM.
AIM-1695 CBCL-190 Author[s]: Maximilian Riesenhuber and Tomaso Poggio Computational Models of Object Recognition in Cortex: A Review August 7, 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1695.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1695.pdf Understanding how biological visual systems perform object recognition is one of the ultimate goals in computational neuroscience. Among the biological models of recognition the main distinctions are between feedforward and feedback and between object-centered and view-centered. From a computational viewpoint the different recognition tasks - for instance categorization and identification - are very similar, representing different trade-offs between specificity and invariance. Thus the different tasks do not strictly require different classes of models. The focus of the review is on feedforward, view-based models that are supported by psychophysical and physiological data.
AIM-1688 CBCL-188 Author[s]: Chikahito Nakajima, Massimiliano Pontil, Bernd Heisele and Tomaso Poggio People Recognition in Image Sequences by Supervised Learning June 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1688.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1688.pdf We describe a system that learns from examples to recognize people in images taken indoors. Images of people are represented by color-based and shape-based features. Recognition is carried out through combinations of Support Vector Machine classifiers (SVMs). Different types of multiclass strategies based on SVMs are explored and compared to k-Nearest Neighbors classifiers (kNNs). The system works in real time and shows high performance rates for people recognition throughout one day.
AIM-1687 CBCL-187 Author[s]: Bernd Heisele, Tomaso Poggio and Massimiliano Pontil Face Detection in Still Gray Images May 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1687.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1687.pdf We present a trainable system for detecting frontal and near-frontal views of faces in still gray images using Support Vector Machines (SVMs). We first consider the problem of detecting the whole face pattern by a single SVM classifer. In this context we compare different types of image features, present and evaluate a new method for reducing the number of features and discuss practical issues concerning the parameterization of SVMs and the selection of training data. The second part of the paper describes a component-based method for face detection consisting of a two-level hierarchy of SVM classifers. On the first level, component classifers independently detect components of a face, such as the eyes, the nose, and the mouth. On the second level, a single classifer checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face.
AIM-1682 CBCL-185 Author[s]: Maximilian Riesenhuber and Tomaso Poggio The Individual is Nothing, the Class Everything: Psychophysics and Modeling of Recognition in Obect Classes May, 1, 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1682.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1682.pdf Most psychophysical studies of object recognition have focussed on the recognition and representation of individual objects subjects had previously explicitely been trained on. Correspondingly, modeling studies have often employed a 'grandmother'-type representation where the objects to be recognized were represented by individual units. However, objects in the natural world are commonly members of a class containing a number of visually similar objects, such as faces, for which physiology studies have provided support for a representation based on a sparse population code, which permits generalization from the learned exemplars to novel objects of that class. In this paper, we present results from psychophysical and modeling studies intended to investigate object recognition in natural ('continuous') object classes. In two experiments, subjects were trained to perform subordinate level discrimination in a continuous object class - images of computer-rendered cars - created using a 3D morphing system. By comparing the recognition performance of trained and untrained subjects we could estimate the effects of viewpoint-specific training and infer properties of the object class-specific representation learned as a result of training. We then compared the experimental findings to simulations, building on our recently presented HMAX model of object recognition in cortex, to investigate the computational properties of a population-based object class representation as outlined above. We find experimental evidence, supported by modeling results, that training builds a viewpoint- and class-specific representation that supplements a pre-existing repre-sentation with lower shape discriminability but possibly greater viewpoint invariance.
AIM-1696 CBCL-191 Author[s]: Vinay Kumar and Tomaso Poggio Learning-Based Approach to Estimation of Morphable Model Parameters September 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1696.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1696.pdf We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
AITR-1685 CBCL-186 Author[s]: Constantine P. Papageorgiou A Trainable System for Object Detection in Images and Video Sequences May 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1685.ps ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1685.pdf This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways -- 1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance.
AIM-1681 CBCL-184 Author[s]: Theodoros Evgeniou and Massimiliano Pontil A Note on the Generalization Performance of Kernel Classifiers with Margin May 1, 2000 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1681.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1681.pdf We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
AIM-1679 CBCL-183 Author[s]: Maximilian Riesenhuber and Tomaso Poggio A Note on Object Class Representation and Categorical Perception December 17, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1679.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1679.pdf We present a novel scheme ("Categorical Basis Functions", CBF) for object class representation in the brain and contrast it to the "Chorus of Prototypes" scheme recently proposed by Edelman. The power and flexibility of CBF is demonstrated in two examples. CBF is then applied to investigate the phenomenon of Categorical Perception, in particular the finding by Bulthoff et al. (1998) of categorization of faces by gender without corresponding Categorical Perception. Here, CBF makes predictions that can be tested in a psychophysical experiment. Finally, experiments are suggested to further test CBF.
AITR-1675 Author[s]: J. Kenneth Salisbury, Jr. and Mandayam A. Srinivasan (editors) Proceedings of the Fourth PHANTOM Users Group Workshop November 4, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1675.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1675.pdf This Report contains the proceedings of the Fourth Phantom Users Group Workshop contains 17 papers presented October 9-12, 1999 at MIT Endicott House in Dedham Massachusetts. The workshop included sessions on, Tools for Programmers, Dynamic Environments, Perception and Cognition, Haptic Connections, Collision Detection / Collision Response, Medical and Seismic Applications, and Haptics Going Mainstream. The proceedings include papers that cover a variety of subjects in computer haptics including rendering, contact determination, development libraries, and applications in medicine, path planning, data interaction and training.
AITR-1674 Author[s]: J.P. Mellor Automatically Recovering Geometry and Texture from Large Sets of Calibrated Images October 22, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1674.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1674.pdf Three-dimensional models which contain both geometry and texture have numerous applications such as urban planning, physical simulation, and virtual environments. A major focus of computer vision (and recently graphics) research is the automatic recovery of three-dimensional models from two- dimensional images. After many years of research this goal is yet to be achieved. Most practical modeling systems require substantial human input and unlike automatic systems are not scalable. This thesis presents a novel method for automatically recovering dense surface patches using large sets (1000's) of calibrated images taken from arbitrary positions within the scene. Physical instruments, such as Global Positioning System (GPS), inertial sensors, and inclinometers, are used to estimate the position and orientation of each image. Essentially, the problem is to find corresponding points in each of the images. Once a correspondence has been established, calculating its three-dimensional position is simply a matter of geometry. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem. The initial stage of the algorithm is completely local and scales linearly with the number of images. Subsequent stages are global in nature, exploit geometric constraints, and scale quadratically with the complexity of the underlying scene. We describe techniques for: 1) detecting and localizing surface patches; 2) refining camera calibration estimates and rejecting false positive surfels; and 3) grouping surface patches into surfaces and growing the surface along a two-dimensional manifold. We also discuss a method for producing high quality, textured three-dimensional models from these surfaces. Some of the most important characteristics of this approach are that it: 1) uses and refines noisy calibration estimates; 2) compensates for large variations in illumination; 3) tolerates significant soft occlusion (e.g. tree branches); and 4) associates, at a fundamental level, an estimated normal (i.e. no frontal-planar assumption) and texture with each surface patch.
AIM-1673 CBCL-180 Author[s]: Constantine P. Papageorgiou and Tomaso Poggio A Trainable Object Detection System: Car Detection in Static Images october 13, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1673.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1673.pdf This paper describes a general, trainable architecture for object detection that has previously been applied to face and peoplesdetection with a new application to car detection in static images. Our technique is a learning based approach that uses a set of labeled training data from which an implicit model of an object class -- here, cars -- is learned. Instead of pixel representations that may be noisy and therefore not provide a compact representation for learning, our training images are transformed from pixel space to that of Haar wavelets that respond to local, oriented, multiscale intensity differences. These feature vectors are then used to train a support vector machine classifier. The detection of cars in images is an important step in applications such as traffic monitoring, driver assistance systems, and surveillance, among others. We show several examples of car detection on out-of- sample images and show an ROC curve that highlights the performance of our system.
AIM-1672 CBCL-179 Author[s]: Vinay P. Kumar and Tomaso Poggio Learning-Based Approach to Real Time Tracking and Analysis of Faces September 23, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1672.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1672.pdf This paper describes a trainable system capable of tracking faces and facialsfeatures like eyes and nostrils and estimating basic mouth features such as sdegrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.
AIM-1670 Author[s]: Mark M. Millonas and Erik M. Rauch Trans-membrane Signal Transduction and Biochemical Turing Pattern Formation September 28, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1670.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1670.pdf The Turing mechanism for the production of a broken spatial symmetry in an initially homogeneous system of reacting and diffusing substances has attracted much interest as a potential model for certain aspects of morphogenesis such as pre- patterning in the embryo, and has also served as a model for self-organization in more generic systems. The two features necessary for the formation of Turing patterns are short- range autocatalysis and long-range inhibition which usually only occur when the diffusion rate of the inhibitor is significantly greater than that of the activator. This observation has sometimes been used to cast doubt on applicability of the Turing mechanism to cellular patterning since many messenger molecules that diffuse between cells do so at more-or-less similar rates. Here we show that stationary, symmetry-breaking Turing patterns can form in physiologically realistic systems even when the extracellular diffusion coefficients are equal; the kinetic properties of the 'receiver' and 'transmitter' proteins responsible for signal transduction will be primary factors governing this process.
AIM-1669 Author[s]: Kinh Tieu and Paul Viola Boosting Image Database Retrieval September 10, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1669.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1669.pdf We present an approach for image database retrieval using a very large number of highly- selective features and simple on-line learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual "causes" and that images which are visually similar share causes. We propose a mechanism for generating a large number of complex features which capture some aspects of this causal structure. Boosting is used to learn simple and efficient classifiers in this complex feature space. Finally we will describe a practical implementation of our retrieval system on a database of 3000 images.
AITR-1668 Author[s]: Tommi Jaakkola, Marina Meila and Tony Jebara Maximum Entropy Discrimination December 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1668.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1668.pdf We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class- conditional models within this framework. Preliminary experimental results are indicative of the potential in these techniques.
AIM-1666 Author[s]: Radhika Nagpal Organizing a Global Coordinate System from Local Information on an Amorphous Computer August 29, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1666.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1666.pdf This paper demonstrates that it is possible to generate a reasonably accurate coordinate system on randomly distributed processors, using only local information and local communication. By coordinate systems we imply that each element assigns itself a logical coordinate that maps to its global physical location, starting with no apriori knowledge of position or orientation. The algorithm presented is inspired by biological systems that use chemical gradients to determine the position of cells. Extensive analysis and simulation results are presented. Two key results are: there is a critical minimum average neighborhood size of 15 for good accuracy and there is a fundamental limit on the resolution of any coordinate system determined strictly from local communication. We also demonstrate that using this algorithm, random distributions of processors produce significantly better accuracy than regular processor grids - such as those used by cellular automata. This has implications for discrete models of biology as well as for building smart sensor arrays.
AIM-1665 Author[s]: Harold Abelson, Don Allen, Daniel Coore, Chris Hanson, George Homsy, Thomas F. Knight, Jr., Radhika Nagpal, Erik Rauch, Gerald Jay Sussman and Ron Weiss Amorphous Computing August 29, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1665.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1665.pdf Amorphous computing is the development of organizational principles and programming languages for obtaining coherent behaviors from the cooperation of myriads of unreliable parts that are interconnected in unknown, irregular, and time-varying ways. The impetus for amorphous computing comes from developments in microfabrication and fundamental biology, each of which is the basis of a kernel technology that makes it possible to build or grow huge numbers of almost-identical information-processing units at almost no cost. This paper sets out a research agenda for realizing the potential of amorphous computing and surveys some initial progress, both in programming and in fabrication. We describe some approaches to programming amorphous systems, which are inspired by metaphors from biology and physics. We also present the basic ideas of cellular computing, an approach to constructing digital-logic circuits within living cells by representing logic levels by concentrations DNA-binding proteins.
AIM-1664 CBCL-178 Author[s]: Anuj Mohan Object Detection in Images by Components August 11, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1664.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1664.pdf In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.
AIM-1663 Author[s]: Rajesh Kasturirangan Multiple Scales in Small-World Networks August 11, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1663.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1663.pdf Small-world architectures may be implicated in a range of phenomena from networks of neurons in the cerebral cortex to social networks and propogation of viruses. Small- world networks are interpolations of regular and random networks that retain the advantages of both regular and random networks by being highly clustered like regular networks and having small average path length between nodes, like random networks. While most of the recent attention on small- world networks has focussed on the effect of introducing disorder/randomness into a regular network, we show that that the fundamental mechanism behind the small- world phenomenon is not disorder/ randomness, but the presence of connections of many different length scales. Consequently, in order to explain the small-world phenomenon, we introduce the concept of multiple scale networks and then state the multiple length scale hypothesis. We show that small-world behavior in randomly rewired networks is a consequence of features common to all multiple scale networks. To support the multiple length scale hypothesis, novel network architectures are introduced that need not be a result of random rewiring of a regular network. In each case it is shown that whenever the network exhibits small- world behavior, it also has connections of diverse length scales. We also show that the distribution of the length scales of the new connections is significantly more important than whether the new connections are long range, medium range or short range.
AIM-1662 Author[s]: Liana M. Lorigo, Olivier Faugeras, W.E.L. Grimson, Renaud Keriven, Ron Kikinis, Carl-Fredrik Westin Co-dimension 2 Geodesic Active Contours for MRA Segmentation August 11, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1662.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1662.pdf Automatic and semi-automatic magnetic resonance angiography (MRA)s segmentation techniques can potentially save radiologists larges amounts of time required for manual segmentation and cans facilitate further data analysis. The proposed MRAs segmentation method uses a mathematical modeling technique whichs is well-suited to the complicated curve-like structure of bloods vessels. We define the segmentation task as ans energy minimization over all 3D curves and use a level set methods to search for a solution. Ours approach is an extension of previous level set segmentations techniques to higher co-dimension.
AIM-1661 CBCL-177 Author[s]: Ryan Rifkin, Massimiliano Pontil and Alessandro Verri A Note on Support Vector Machines Degeneracy August 11, 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1661.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1661.pdf When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.
AIM-1658 CBCL-173 Author[s]: Tony Ezzat and Tomaso Poggio Visual Speech Synthesis by Morphing Visemes May 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1658.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1658.pdf We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
AIM-1657 Author[s]: Hany Farid Detecting Digital Forgeries Using Bispectral Analysis December 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1657.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1657.pdf With the rapid increase in low-cost and sophisticated digital technology the need for techniques to authenticate digital material will become more urgent. In this paper we address the problem of authenticating digital signals assuming no explicit prior knowledge of the original. The basic approach that we take is to assume that in the frequency domain a "natural" signal has weak higher-order statistical correlations. We then show that "un-natural" correlations are introduced if this signal is passed through a non-linearity (which would almost surely occur in the creation of a forgery). Techniques from polyspectral analysis are then used to detect the presence of these correlations. We review the basics of polyspectral analysis, show how and why these tools can be used in detecting forgeries and show their effectiveness in analyzing human speech.
AIM-1656 CBCL-172 Author[s]: Theodoros Evgeniou and Massimiliano Pontil On the V(subscript gamma) Dimension for Regression in Reproducing Kernel Hilbert Spaces May 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1656.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1656.pdf This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.
AIM-1655 Author[s]: Gideon P. Stein, Raquel Romano and Lily Lee Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame April 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1655.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1655.pdf Passive monitoring of large sites typically requires coordination between multiple cameras, which in turn requires methods for automatically relating events between distributed cameras. This paper tackles the problem of self-calibration of multiple cameras which are very far apart, using feature correspondences to determine the camera geometry. The key problem is finding such correspondences. Since the camera geometry and photometric characteristics vary considerably between images, one cannot use brightness and/or proximity constraints. Instead we apply planar geometric constraints to moving objects in the scene in order to align the scene"s ground plane across multiple views. We do not assume synchronized cameras, and we show that enforcing geometric constraints enables us to align the tracking data in time. Once we have recovered the homography which aligns the planar structure in the scene, we can compute from the homography matrix the 3D position of the plane and the relative camera positions. This in turn enables us to recover a homography matrix which maps the images to an overhead view. We demonstrate this technique in two settings: a controlled lab setting where we test the effects of errors in internal camera calibration, and an uncontrolled, outdoor setting in which the full procedure is applied to external camera calibration and ground plane recovery. In spite of noise in the internal camera parameters and image data, the system successfully recovers both planar structure and relative camera positions in both settings.
AIM-1654 CBCL-171 Author[s]: Theodoros Evgeniou, Massimiliano Pontil and Tomaso Poggio A Unified Framework for Regularization Networks and Support Vector Machines March 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1654.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1654.pdf Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.
AIM-1653 CBCL-170 Author[s]: Sayan Mukherjee and Vladimir Vapnik Multivariate Density Estimation: An SVM Approach April 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1653.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1653.pdf We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in $R^2$ and $R^{12}$. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.
AIM-1652 Author[s]: Marina Meila An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High Dimensional Sparse Data January 1999 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1652.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1652.pdf Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution $P$. This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 2-3 orders of magnitude in the experiments.
AIM-1651 CBCL-168 Author[s]: Massimiliano Pontil, Sayan Mukherjee and Federico Girosi On the Noise Model of Support Vector Machine Regression October 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1651.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1651.pdf Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
AITR-1650 CBCL-167 Author[s]: Christian R. Shelton Three-Dimensional Correspondence December 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1650.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1650.pdf This paper describes the problem of three-dimensional object correspondence and presents an algorithm for matching two three-dimensional colored surfaces using polygon reduction and the minimization of an energy function. At the core of this algorithm is a novel data-dependent multi-resolution pyramid for polygonal surfaces. The algorithm is general to correspondence between any two manifolds of the same dimension embedded in a higher dimensional space. Results demonstrating correspondences between various objects are presented and a method for incorporating user input is also detailed.
AIM-1649 CBCL-166 Author[s]: Massimiliano Pontil, Ryan Rifkin and Theodoros Evgeniou From Regression to Classification in Support Vector Machines November 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1649.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1649.pdf We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1-epsilon)^{- 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1-epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.
AIM-1648 CBCL-165 Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris Estimating Dependency Structure as a Hidden Variable September 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1648.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1648.pdf This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification.
AIM-1647 Author[s]: Hany Farid and Edward H. Adelson Separating Reflections from Images Using Independent Components Analysis September 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1647.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1647.pdf The image of an object can vary dramatically depending on lighting, specularities/reflections and shadows. It is often advantageous to separate these incidental variations from the intrinsic aspects of an image. Along these lines this paper describes a method for photographing objects behind glass and digitally removing the reflections off the glass leaving the image of the objects behind the glass intact. We describe the details of this method which employs simple optical techniques and independent components analysis (ICA) and show its efficacy with several examples.
AIM-1646 CBCL-164 Author[s]: Nicholas Chan, Blake LeBaron, Andrew Lo and Tomaso Poggio Information Dissemination and Aggregation in Asset Markets with Simple Intelligent Traders September 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1646.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1646.pdf Various studies of asset markets have shown that traders are capable of learning and transmitting information through prices in many situations. In this paper we replace human traders with intelligent software agents in a series of simulated markets. Using these simple learning agents, we are able to replicate several features of the experiments with human subjects, regarding (1) dissemination of information from informed to uninformed traders, and (2) aggregation of information spread over different traders.
AITR-1645 Author[s]: Deborah A. Wallach A Hierarchical Cache Coherent Protocol September 1992 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1645.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1645.pdf As the number of processors in distributed- memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared- memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n- cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.
AIM-1642 Author[s]: Parry Husbands, Charles Lee Isbell, Jr. and Alan Edelman Interactive Supercomputing with MIT Matlab July 28, 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1642.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1642.pdf This paper describes MITMatlab, a system that enables users of supercomputers or networked PCs to work on large data sets within Matlab transparently. MITMatlab is based on the Parallel Problems Server (PPServer), a standalone 'linear algebra server' that provides a mechanism for running distributed memory algorithms on large data sets. The PPServer and MITMatlab enable high-performance interactive supercomputing. With such a tool, researchers can now use Matlab as more than a prototyping tool for experimenting with small problems. Instead, MITMatlab makes is possible to visualize and operate interactively on large data sets. This has implications not only in supercomputing, but for Artificial Intelligence applicatons such as Machine Learning, Information Retrieval and Image Processing.
AIM-1640 CBCL-163 Author[s]: Zhaoping Li Pre-Attentive Segmentation in the Primary Visual Cortex June 30, 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1640.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1640.pdf Stimuli outside classical receptive fields have been shown to exert significant influence over the activities of neurons in primary visual cortexWe propose that contextual influences are used for pre-attentive visual segmentation, in a new framework called segmentation without classification. This means that segmentation of an image into regions occurs without classification of features within a region or comparison of features between regions. This segmentation framework is simpler than previous computational approaches, making it implementable by V1 mechanisms, though higher leve l visual mechanisms are needed to refine its output. However, it easily handles a class of segmentation problems that are tricky in conventional methods. The cortex computes global region boundaries by detecting the breakdown of homogeneity or translation invariance in the input, using local intra-cortical interactions mediated by the horizontal connections. The difference between contextual influences near and far from region boundaries makes neural activities near region boundaries higher than elsewhere, making boundaries more salient for perceptual pop-out. This proposal is implemented in a biologically based model of V1, and demonstrated using examples of texture segmentation and figure-ground segregation. The model performs segmentation in exactly the same neural circuit that solves the dual problem of the enhancement of contours, as is suggested by experimental observations. Its behavior is compared with psychophysical and physiological data on segmentation, contour enhancement, and contextual influences. We discuss the implications of segmentation without classification and the predictions of our V1 model, and relate it to other phenomena such as asymmetry in visual search.
AITR-1639 Author[s]: Oded Maron Learning from Ambiguity December 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1639.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1639.pdf There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as Multiple- Instance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multiple- instance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful examples of applying Diverse Density techniques.
AIM-1638 Author[s]: Oded Maron and Tomas Lozano-Perez Visible Decomposition: Real-Time Path Planning in Large Planar Environments June, 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1638.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1638.pdf We describe a method called Visible Decomposition for computing collision-free paths in real time through a planar environment with a large number of obstacles. This method divides space into local visibility graphs, ensuring that all operations are local. The search time is kept low since the number of regions is proved to be small. We analyze the computational demands of the algorithm and the quality of the paths it produces. In addition, we show test results on a large simulation testbed.
AIM-1637 Author[s]: Gina-Anne Levow Corpus-Based Techniques for Word Sense Disambiguation May 27, 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1637.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1637.pdf The need for robust and easily extensible systems for word sense disambiguation coupled with successes in training systems for a variety of tasks using large on-line corpora has led to extensive research into corpus-based statistical approaches to this problem. Promising results have been achieved by vector space representations of context, clustering combined with a semantic knowledge base, and decision lists based on collocational relations. We evaluate these techniques with respect to three important criteria: how their definition of context affects their ability to incorporate different types of disambiguating information, how they define similarity among senses, and how easily they can generalize to new senses. The strengths and weaknesses of these systems provide guidance for future systems which must capture and model a variety of disambiguating information, both syntactic and semantic.
AIM-1636 Author[s]: Charles Isbell and Paul Viola Restructuring Sparse High Dimensional Data for Effective Retrieval May 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1636.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1636.pdf The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.
AIM-1635 CBCL-162 Author[s]: Constantine P. Papgeorgiou, Federico Girosi and Tomaso Poggio Sparse Correlation Kernel Analysis and Reconstruction May 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1635.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1635.pdf This paper presents a new paradigm for signal reconstruction and superresolution, Correlation Kernel Analysis (CKA), that is based on the selection of a sparse set of bases from a large dictionary of class- specific basis functions. The basis functions that we use are the correlation functions of the class of signals we are analyzing. To choose the appropriate features from this large dictionary, we use Support Vector Machine (SVM) regression and compare this to traditional Principal Component Analysis (PCA) for the tasks of signal reconstruction, superresolution, and compression. The testbed we use in this paper is a set of images of pedestrians. This paper also presents results of experiments in which we use a dictionary of multiscale basis functions and then use Basis Pursuit De-Noising to obtain a sparse, multiscale approximation of a signal. The results are analyzed and we conclude that 1) when used with a sparse representation technique, the correlation function is an effective kernel for image reconstruction and superresolution, 2) for image compression, PCA and SVM have different tradeoffs, depending on the particular metric that is used to evaluate the results, 3) in sparse representation techniques, L_1 is not a good proxy for the true measure of sparsity, L_0, and 4) the L_epsilon norm may be a better error metric for image reconstruction and compression than the L_2 norm, though the exact psychophysical metric should take into account high order structure in images.
AITR-1634 Author[s]: Anita M. Flynn Piezoelectric Ultrasonic Micromotors June 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1634.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1634.pdf This report describes development of micro- fabricated piezoelectric ultrasonic motors and bulk-ceramic piezoelectric ultrasonic motors. Ultrasonic motors offer the advantage of low speed, high torque operation without the need for gears. They can be made compact and lightweight and provide a holding torque in the absence of applied power, due to the traveling wave frictional coupling mechanism between the rotor and the stator. This report covers modeling, simulation, fabrication and testing of ultrasonic motors. Design of experiments methods were also utilized to find optimal motor parameters. A suite of 8 mm diameter x 3 mm tall motors were machined for these studies and maximum stall torques as large as 10^(- 3) Nm, maximum no-load speeds of 1710 rpm and peak power outputs of 27 mW were realized. Aditionally, this report describes the implementation of a microfabricated ultrasonic motor using thin- film lead zirconate titanate. In a joint project with the Pennsylvania State University Materials Research Laboratory and MIT Lincoln Laboratory, 2 mm and 5 mm diameter stator structures were fabricated on 1 micron thick silicon nitride membranes. Small glass lenses placed down on top spun at 100-300 rpm with 4 V excitation at 90 kHz. The large power densities and stall torques of these piezoelectric ultrasonic motors offer tremendous promis for integrated machines: complete intelligent, electro-mechanical autonomous systems mass-produced in a single fabrication process.
AIM-1633 Author[s]: Kenneth Yip and Gerald Jay Sussman Sparse Representations for Fast, One-Shot Learning November 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1633.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1633.pdf Humans rapidly and reliably learn many kinds of regularities and generalizations. We propose a novel model of fast learning that exploits the properties of sparse representations and the constraints imposed by a plausible hardware mechanism. To demonstrate our approach we describe a computational model of acquisition in the domain of morphophonology. We encapsulate phonological information as bidirectional boolean constraint relations operating on the classical linguistic representations of speech sounds in term of distinctive features. The performance model is described as a hardware mechanism that incrementally enforces the constraints. Phonological behavior arises from the action of this mechanism. Constraints are induced from a corpus of common English nouns and verbs. The induction algorithm compiles the corpus into increasingly sophisticated constraints. The algorithm yields one-shot learning from a few examples. Our model has been implemented as a computer program. The program exhibits phonological behavior similar to that of young children. As a bonus the constraints that are acquired can be interpreted as classical linguistic rules.
AIM-1632 CBCL-161 Author[s]: Tomaso Poggio and Federico Girosi Notes on PCA, Regularization, Sparsity and Support Vector Machines May 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1632.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1632.pdf We derive a new representation for a function as a linear combination of local correlation kernels at optimal sparse locations and discuss its relation to PCA, regularization, sparsity principles and Support Vector Machines. We first review previous results for the approximation of a function from discrete data (Girosi, 1998) in the context of Vapnik"s feature space and dual representation (Vapnik, 1995). We apply them to show 1) that a standard regularization functional with a stabilizer defined in terms of the correlation function induces a regression function in the span of the feature space of classical Principal Components and 2) that there exist a dual representations of the regression function in terms of a regularization network with a kernel equal to a generalized correlation function. We then describe the main observation of the paper: the dual representation in terms of the correlation function can be sparsified using the Support Vector Machines (Vapnik, 1982) technique and this operation is equivalent to sparsify a large dictionary of basis functions adapted to the task, using a variation of Basis Pursuit De-Noising (Chen, Donoho and Saunders, 1995; see also related work by Donahue and Geiger, 1994; Olshausen and Field, 1995; Lewicki and Sejnowski, 1998). In addition to extending the close relations between regularization, Support Vector Machines and sparsity, our work also illuminates and formalizes the LFA concept of Penev and Atick (1996). We discuss the relation between our results, which are about regression, and the different problem of pattern classification.
AIM-1631 Author[s]: Kevin K. Lin Coordinate-Independent Computations on Differential Equations March 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1631.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1631.pdf This project investigates the computational representation of differentiable manifolds, with the primary goal of solving partial differential equations using multiple coordinate systems on general n- dimensional spaces. In the process, this abstraction is used to perform accurate integrations of ordinary differential equations using multiple coordinate systems. In the case of linear partial differential equations, however, unexpected difficulties arise even with the simplest equations.
AIM-1630 Author[s]: Thomas Marill Recovery of Three-Dimensional Objects from Single Perspective Images March 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1630.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1630.pdf Any three-dimensional wire-frame object constructed out of parallelograms can be recovered from a single perspective two-dimensional image. A procedure for performing the recovery is given.
AIM-1629 CBCL-160 Author[s]: Maximilian Riesenhuber and Tomaso Poggio Modeling Invariances in Inferotemporal Cell Tuning March 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1629.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1629.pdf In macaque inferotemporal cortex (IT), neurons have been found to respond selectively to complex shapes while showing broad tuning ("invariance") with respect to stimulus transformations such as translation and scale changes and a limited tuning to rotation in depth. Training monkeys with novel, paperclip-like objects, Logothetis et al. could investigate whether these invariance properties are due to experience with exhaustively many transformed instances of an object or if there are mechanisms that allow the cells to show response invariance also to previously unseen instances of that object. They found object-selective cells in anterior IT which exhibited limited invariance to various transformations after training with single object views. While previous models accounted for the tuning of the cells for rotations in depth and for their selectivity to a specific object relative to a population of distractor objects, the model described here attempts to explain in a biologically plausible way the additional properties of translation and size invariance. Using the same stimuli as in the experiment, we find that model IT neurons exhibit invariance properties which closely parallel those of real neurons. Simulations show that the model is capable of unsupervised learning of view-tuned neurons. The model also allows to make experimentally testable predictions regarding novel stimulus transformations and combinations of stimuli.
AIM-1628 Author[s]: Brian Scassellati A Binocular, Foveated Active Vision System March 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1628.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1628.pdf This report documents the design and implementation of a binocular, foveated active vision system as part of the Cog project at the MIT Artificial Intelligence Laboratory. The active vision system features a three degree of freedom mechanical platform that supports four color cameras, a motion control system, and a parallel network of digital signal processors for image processing. To demonstrate the capabilities of the system, we present results from four sample visual-motor tasks.
AITR-1627 Author[s]: Alan Bawden Implementing Distributed Systems Using Linear Naming March 1993 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1627.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1627.pdf Linear graph reduction is a simple computational model in which the cost of naming things is explicitly represented. The key idea is the notion of "linearity". A name is linear if it is only used once, so with linear naming you cannot create more than one outstanding reference to an entity. As a result, linear naming is cheap to support and easy to reason about. Programs can be translated into the linear graph reduction model such that linear names in the program are implemented directly as linear names in the model. Nonlinear names are supported by constructing them out of linear names. The translation thus exposes those places where the program uses names in expensive, nonlinear ways. Two applications demonstrate the utility of using linear graph reduction: First, in the area of distributed computing, linear naming makes it easy to support cheap cross-network references and highly portable data structures, Linear naming also facilitates demand driven migration of tasks and data around the network without requiring explicit guidance from the programmer. Second, linear graph reduction reveals a new characterization of the phenomenon of state. Systems in which state appears are those which depend on certain - global- system properties. State is not a localizable phenomenon, which suggests that our usual object oriented metaphor for state is flawed.
AIM-1626 Author[s]: Radhika Nagpal and Daniel Coore An Algorithm for Group Formation and Maximal Independent Set in an Amorphous Computer February 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1626.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1626.pdf Amorphous computing is the study of programming ultra-scale computing environments of smart sensors and actuators cite{white-paper}. The individual elements are identical, asynchronous, randomly placed, embedded and communicate locally via wireless broadcast. Aggregating the processors into groups is a useful paradigm for programming an amorphous computer because groups can be used for specialization, increased robustness, and efficient resource allocation. This paper presents a new algorithm, called the clubs algorithm, for efficiently aggregating processors into groups in an amorphous computer, in time proportional to the local density of processors. The clubs algorithm is well-suited to the unique characteristics of an amorphous computer. In addition, the algorithm derives two properties from the physical embedding of the amorphous computer: an upper bound on the number of groups formed and a constant upper bound on the density of groups. The clubs algorithm can also be extended to find the maximal independent set (MIS) and $Delta + 1$ vertex coloring in an amorphous computer in $O(log N)$ rounds, where $N$ is the total number of elements and $Delta$ is the maximum degree.
AIM-1625 CBCL-159 Author[s]: Thomas Hofmann and Jan Puzicha Statistical Models for Co-occurrence Data February 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1625.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1625.pdf Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
AIM-1624 CBCL-158 Author[s]: Yar Weiss and Edward H. Adelson Slow and Smooth: A Bayesian Theory for the Combination of Local Motion Signals in Human Vision February 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1624.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1624.pdf In order to estimate the motion of an object, the visual system needs to combine multiple local measurements, each of which carries some degree of ambiguity. We present a model of motion perception whereby measurements from different image regions are combined according to a Bayesian estimator --- the estimated motion maximizes the posterior probability assuming a prior favoring slow and smooth velocities. In reviewing a large number of previously published phenomena we find that the Bayesian estimator predicts a wide range of psychophysical results. This suggests that the seemingly complex set of illusions arise from a single computational strategy that is optimal under reasonable assumptions.
AIM-1621 Author[s]: Gideon P. Stein and Amnon Shashua Direct Estimation of Motion and Extended Scene Structure from a Moving Stereo Rig December 1998 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1621.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1621.pdf We describe a new method for motion estimation and 3D reconstruction from stereo image sequences obtained by a stereo rig moving through a rigid world. We show that given two stereo pairs one can compute the motion of the stereo rig directly from the image derivatives (spatial and temporal). Correspondences are not required. One can then use the images from both pairs combined to compute a dense depth map. The motion estimates between stereo pairs enable us to combine depth maps from all the pairs in the sequence to form an extended scene reconstruction and we show results from a real image sequence. The motion computation is a linear least squares computation using all the pixels in the image. Areas with little or no contrast are implicitly weighted less so one does not have to explicitly apply a confidence measure.
AIM-1620 CBCL-157 Author[s]: Gideon P. Stein and Amnon Shashua On Degeneracy of Linear Reconstruction from Three Views: Linear Line Complex and Applications December 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1620.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1620.pdf This paper investigates the linear degeneracies of projective structure estimation from point and line features across three views. We show that the rank of the linear system of equations for recovering the trilinear tensor of three views reduces to 23 (instead of 26) in the case when the scene is a Linear Line Complex (set of lines in space intersecting at a common line) and is 21 when the scene is planar. The LLC situation is only linearly degenerate, and we show that one can obtain a unique solution when the admissibility constraints of the tensor are accounted for. The line configuration described by an LLC, rather than being some obscure case, is in fact quite typical. It includes, as a particular example, the case of a camera moving down a hallway in an office environment or down an urban street. Furthermore, an LLC situation may occur as an artifact such as in direct estimation from spatio-temporal derivatives of image brightness. Therefore, an investigation into degeneracies and their remedy is important also in practice.
AIM-1619 CBCL-156 Author[s]: Theodoros Evgeniou and Tomaso Poggio Sparse Representations of Multiple Signals September 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1619.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1619.pdf We discuss the problem of finding sparse representations of a class of signals. We formalize the problem and prove it is NP-complete both in the case of a single signal and that of multiple ones. Next we develop a simple approximation method to the problem and we show experimental results using artificially generated signals. Furthermore,we use our approximation method to find sparse representations of classes of real signals, specifically of images of pedestrians. We discuss the relation between our formulation of the sparsity problem and the problem of finding representations of objects that are compact and appropriate for detection and classification.
AIM-1618 Author[s]: Dan Halperin and Christian R. Shelton A Perturbation Scheme for Spherical Arrangements with Application to Molecular Modeling December 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1618.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1618.pdf We describe a software package for computing and manipulating the subdivision of a sphere by a collection of (not necessarily great) circles and for computing the boundary surface of the union of spheres. We present problems that arise in the implementation of the software and the solutions that we have found for them. At the core of the paper is a novel perturbation scheme to overcome degeneracies and precision problems in computing spherical arrangements while using floating point arithmetic. The scheme is relatively simple, it balances between the efficiency of computation and the magnitude of the perturbation, and it performs well in practice. In one O(n) time pass through the data, it perturbs the inputs necessary to insure no potential degeneracies and then passes the perturbed inputs on to the geometric algorithm. We report and discuss experimental results. Our package is a major component in a larger package aimed to support geometric queries on molecular models; it is currently employed by chemists working in "rational drug design." The spherical subdivisions are used to construct a geometric model of a molecule where each sphere represents an atom. We also give an overview of the molecular modeling package and detail additional features and implementation issues.
AITR-1617 Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan Proceedings of the Second PHANToM User's Group Workshop December 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1617.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1617.pdf On October 19-22, 1997 the Second PHANToM Users Group Workshop was held at the MIT Endicott House in Dedham, Massachusetts. Designed as a forum for sharing results and insights, the workshop was attended by more than 60 participants from 7 countries. These proceedings report on workshop presentations in diverse areas including rigid and compliant rendering, tool kits, development environments, techniques for scientific data visualization, multi-modal issues and a programming tutorial.
AIM-1616 CBCL-155 Author[s]: Yair Weiss Belief Propagation and Revision in Networks with Loops November 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1616.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1616.pdf Local belief propagation rules of the sort proposed by Pearl(1988) are guaranteed to converge to the optimal beliefs for singly connected networks. Recently, a number of researchers have empirically demonstrated good performance of these same algorithms on networks with loops, but a theoretical understanding of this performance has yet to be achieved. Here we lay the foundation for an understanding of belief propagation in networks with loops. For networks with a single loop, we derive ananalytical relationship between the steady state beliefs in the loopy network and the true posterior probability. Using this relationship we show a category of networks for which the MAP estimate obtained by belief update and by belief revision can be proven to be optimal (although the beliefs will be incorrect). We show how nodes can use local information in the messages they receive in order to correct the steady state beliefs. Furthermore we prove that for all networks with a single loop, the MAP estimate obtained by belief revisionat convergence is guaranteed to give the globally optimal sequence of states. The result is independent of the length of the cycle and the size of the statespace. For networks with multiple loops, we introduce the concept of a "balanced network" and show simulati.
AIM-1615 CBCL-154 Author[s]: Shimon Edelman and Sharon Duvdevani-Bar Visual Recognition and Categorization on the Basis of Similarities to Multiple Class Prototypes September 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1615.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1615.pdf To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open-ended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.
AIM-1614 Author[s]: Daniel Coore, Radhika Nagpal and Ron Weiss Paradigms for Structure in an Amorphous Computer October 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1614.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1614.pdf Recent developments in microfabrication and nanotechnology will enable the inexpensive manufacturing of massive numbers of tiny computing elements with sensors and actuators. New programming paradigms are required for obtaining organized and coherent behavior from the cooperation of large numbers of unreliable processing elements that are interconnected in unknown, irregular, and possibly time-varying ways. Amorphous computing is the study of developing and programming such ultrascale computing environments. This paper presents an approach to programming an amorphous computer by spontaneously organizing an unstructured collection of processing elements into cooperative groups and hierarchies. This paper introduces a structure called an AC Hierarchy, which logically organizes processors into groups at different levels of granularity. The AC hierarchy simplifies programming of an amorphous computer through new language abstractions, facilitates the design of efficient and robust algorithms, and simplifies the analysis of their performance. Several example applications are presented that greatly benefit from the AC hierarchy. This paper introduces three algorithms for constructing multiple levels of the hierarchy from an unstructured collection of processors.
AIM-1613 CBCL-153 Author[s]: Zhaoping Li Visual Segmentation without Classification in a Model of the Primary Visual Cortex August 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1613.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1613.pdf Stimuli outside classical receptive fields significantly influence the neurons' activities in primary visual cortex. We propose that such contextual influences are used to segment regions by detecting the breakdown of homogeneity or translation invariance in the input, thus computing global region boundaries using local interactions. This is implemented in a biologically based model of V1, and demonstrated in examples of texture segmentation and figure-ground segregation. By contrast with traditional approaches, segmentation occurs without classification or comparison of features within or between regions and is performed by exactly the same neural circuit responsible for the dual problem of the grouping and enhancement of contours.
AIM-1612 CBCL-152 Author[s]: Massimiliano Pontil and Alessandro Verri Properties of Support Vector Machines August 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1612.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1612.pdf Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.
AIM-1611 CBCL-151 Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris Estimating Dependency Structure as a Hidden Variable June 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1611.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1611.pdf This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EMand the Minimum Spanning Tree algorithm to find the ML and MAP mixtureof trees for a variety of priors, including the Dirichlet and the MDL priors.
AIM-1610 CBCL-150 Author[s]: Marcus Dill and Shimon Edelman Translation Invariance in Object Recognition, and Its Relation to Other Visual Transformations June 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1610.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1610.pdf Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.
AIM-1608 CBCL-148 Author[s]: Gad Geiger and Jerome Y. Lettvin A View on Dyslexia June 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1608.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1608.pdf We describe here, briefly, a perceptual non- reading measure which reliably distinguishes between dyslexic persons and ordinary readers. More importantly, we describe a regimen of practice with which dyslexics learn a new perceptual strategy for reading. Two controlled experiment on dyslexics children demonstrate the regimen's efficiency.
AIM-1606 CBCL-147 Author[s]: Federico Girosi An Equivalence Between Sparse Approximation and Support Vector Machines May 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1606.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1606.pdf In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.
AIM-1605 CBCL-146 Author[s]: Marina Meila and Michael I. Jordan Triangulation by Continuous Embedding March 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1605.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1605.pdf When triangulating a belief network we aim to obtain a junction tree of minimum state space. Searching for the optimal triangulation can be cast as a search over all the permutations of the network's vaeriables. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. In this paper we introduce an upper bound to the total junction tree weight as the cost function. The appropriatedness of this choice is discussed and explored by simulations. Then we present two ways of embedding the new objective function into continuous domains and show that they perform well compared to the best known heuristic.
AIM-1604 Author[s]: William J. Dally, Leonard McMillan, Gary Bishop and Henry Fuchs The Delta Tree: An Object-Centered Approach to Image-Based Rendering May 2, 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1604.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1604.pdf This paper introduces the delta tree, a data structure that represents an object using a set of reference images. It also describes an algorithm for generating arbitrary re- projections of an object by traversing its delta tree. Delta trees are an efficient representation in terms of both storage and rendering performance. Each node of a delta tree stores an image taken from a point on a sampling sphere that encloses the object. Each image is compressed by discarding pixels that can be reconstructed by warping its ancestor's images to the node's viewpoint. The partial image stored at each node is divided into blocks and represented in the frequency domain. The rendering process generates an image at an arbitrary viewpoint by traversing the delta tree from a root node to one or more of its leaves. A subdivision algorithm selects only the required blocks from the nodes along the path. For each block, only the frequency components necessary to reconstruct the final image at an appropriate sampling density are used. This frequency selection mechanism handles both antialiasing and level-of-detail within a single framework. A complex scene is initially rendered by compositing images generated by traversing the delta trees of its components. Once the reference views of a scene are rendered once in this manner, the entire scene can be reprojected to an arbitrary viewpoint by traversing its own delta tree. Our approach is limited to generating views of an object from outside the object's convex hull. In practice we work around this problem by subdividing objects to render views from within the convex hull.
AIM-1603 CBCL-145 Author[s]: Shai Avidan, Theodoros Evgeniou, Amnon Shashua and Tomaso Poggio Image-Based View Synthesis January 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1603.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1603.pdf We present a new method for rendering novel images of flexible 3D objects from a small number of example images in correspondence. The strength of the method is the ability to synthesize images whose viewing position is significantly far away from the viewing cone of the example images ("view extrapolation"), yet without ever modeling the 3D structure of the scene. The method relies on synthesizing a chain of "trilinear tensors" that governs the warping function from the example images to the novel image, together with a multi-dimensional interpolation function that synthesizes the non-rigid motions of the viewed object from the virtual camera position. We show that two closely spaced example images alone are sufficient in practice to synthesize a significant viewing cone, thus demonstrating the ability of representing an object by a relatively small number of model images --- for the purpose of cheap and fast viewers that can run on standard hardware.
AIM-1602 CBCL-144 Author[s]: Edgar Osuna, Robert Freund and Federico Girosi Support Vector Machines: Training and Applications March 1997 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1602.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1602.pdf The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi- Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub- problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.
AIM-1600 CBCL-143 Author[s]: Thomas Vetter, Michael J. Jones and Tomaso Poggio A Bootstrapping Algorithm for Learning Linear Models of Object Classes
ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1600.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1600.pdf Flexible models of object classes, based on linear combinations of prototypical images, are capable of matching novel images of the same class and have been shown to be a powerful tool to solve several fundamental vision tasks such as recognition, synthesis and correspondence. The key problem in creating a specific flexible model is the computation of pixelwise correspondence between the prototypes, a task done until now in a semiautomatic way. In this paper we describe an algorithm that automatically bootstraps the correspondence between the prototypes. The algorithm - which can be used for 2D images as well as for 3D models - is shown to synthesize successfully a flexible model of frontal face images and a flexible model of handwritten digits.
AIM-1599 CBCL-142 Author[s]: B. Schoelkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio and V. Vapnik Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1599.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1599.pdf The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by $k$-- means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application.
AIM-1598 CBCL-141 Author[s]: Joerg C. Lemm Prior Information and Generalized Questions
ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1598.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1598.pdf In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). This paper 1.) studies aspects on which these two categories usually differ, like their relevance for generalization and their role in the loss function, 2.) presents a unifying formalism, where both types of information are identified with answers to generalized questions, 3.) shows what kind of generalized information is necessary to enable learning, 4.) aims to put usual training data and prior information on a more equal footing by discussing possibilities and variants of measurement and control for generalized questions, including the examples of smoothness and symmetries, 5.) reviews shortly the measurement of linguistic concepts based on fuzzy priors, and principles to combine preprocessors, 6.) uses a Bayesian decision theoretic framework, contrasting parallel and inverse decision problems, 7.) proposes, for problems with non--approximation aspects, a Bayesian two step approximation consisting of posterior maximization and a subsequent risk minimization, 8.) analyses empirical risk minimization under the aspect of nonlocal information 9.) compares the Bayesian two step approximation with empirical risk minimization, including their interpretations of Occam's razor, 10.) formulates examples of stationarity conditions for the maximum posterior approximation with nonlocal and nonconvex priors, leading to inhomogeneous nonlinear equations, similar for example to equations in scattering theory in physics. In summary, this paper focuses on the dependencies between answers to different questions. Because not training examples alone but such dependencies enable generalization, it emphasizes the need of their empirical measurement and control and of a more explicit treatment in theory.
AITR-1596 Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan (editors) The Proceedings of the First PHANToM User's Group Workshop December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1596.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1596.pdf These proceedings summarize the results of the First PHANToM User's Group Workshop held September 27-30, 1996 MIT. The goal of the workshop was to bring together a group of active users of the PHANToM Haptic Interface to discuss the scientific and engineering challenges involved in bringing haptics into widespread use, and to explore the future possibilities of this exciting technology. With over 50 attendees and 25 presentations the workshop provided the first large forum for users of a common haptic interface to share results and engage in collaborative discussions. Short papers from the presenters are contained herein and address the following topics: Research Effort Overviews, Displays and Effects, Applications in Teleoperation and Training, Tools for Simulated Worlds and, Data Visualization.
AIM-1595 Author[s]: Gideon P. Stein Lens Distortion Calibration Using Point Correspondences December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1595.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1595.pdf This paper describes a new method for lens distortion calibration using only point correspondences in multiple views, without the need to know either the 3D location of the points or the camera locations. The standard lens distortion model is a model of the deviations of a real camera from the ideal pinhole or projective camera model.Given multiple views of a set of corresponding points taken by ideal pinhole cameras there exist epipolar and trilinear constraints among pairs and triplets of these views. In practice, due to noise in the feature detection and due to lens distortion these constraints do not hold exactly and we get some error. The calibration is a search for the lens distortion parameters that minimize this error. Using simulation and experimental results with real images we explore the properties of this method. We describe the use of this method with the standard lens distortion model, radial and decentering, but it could also be used with any other parametric distortion models. Finally we demonstrate that lens distortion calibration improves the accuracy of 3D reconstruction.
AIM-1594 Author[s]: Gideon P. Stein and Amnon Shashua Direct Methods for Estimation of Structure and Motion from Three Views December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1594.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1594.pdf We describe a new direct method for estimating structure and motion from image intensities of multiple views. We extend the direct methods of Horn- and-Weldon to three views. Adding the third view enables us to solve for motion, and compute a dense depth map of the scene, directly from image spatio -temporal derivatives in a linear manner without first having to find point correspondences or compute optical flow. We describe the advantages and limitations of this method which are then verified through simulation and experiments with real images.
AIM-1593 Author[s]: J.P. Mellor, Seth Teller and Tomas Lozano-Perez Dense Depth Maps from Epipolar Images November 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1593.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1593.pdf Recovering three-dimensional information from two-dimensional images is the fundamental goal of stereo techniques. The problem of recovering depth (three- dimensional information) from a set of images is essentially the correspondence problem: Given a point in one image, find the corresponding point in each of the other images. Finding potential correspondences usually involves matching some image property. If the images are from nearby positions, they will vary only slightly, simplifying the matching process. Once a correspondence is known, solving for the depth is simply a matter of geometry. Real images are composed of noisy, discrete samples, therefore the calculated depth will contain error. This error is a function of the baseline or distance between the images. Longer baselines result in more precise depths. This leads to a conflict: short baselines simplify the matching process, but produce imprecise results; long baselines produce precise results, but complicate the matching process. In this paper, we present a method for generating dense depth maps from large sets (1000's) of images taken from arbitrary positions. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem, removing nearly all ambiguity. The algorithm presented is completely local and for each pixel generates an evidence versus depth and surface normal distribution. In many cases, the distribution contains a clear and distinct global maximum. The location of this peak determines the depth and its shape can be used to estimate the error. The distribution can also be used to perform a maximum likelihood fit of models directly to the images. We anticipate that the ability to perform maximum likelihood estimation from purely local calculations will prove extremely useful in constructing three dimensional models from large sets of images.
AIM-1592 CBCL-140 Author[s]: Theodoros Evgeniou Image Based Rendering Using Algebraic Techniques November 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1592.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1592.pdf This paper presents an image-based rendering system using algebraic relations between different views of an object. The system uses pictures of an object taken from known positions. Given three such images it can generate "virtual'' ones as the object would look from any position near the ones that the two input images were taken from. The extrapolation from the example images can be up to about 60 degrees of rotation. The system is based on the trilinear constraints that bind any three view so fan object. As a side result, we propose two new methods for camera calibration. We developed and used one of them. We implemented the system and tested it on real images of objects and faces. We also show experimentally that even when only two images taken from unknown positions are given, the system can be used to render the object from other view points as long as we have a good estimate of the internal parameters of the camera used and we are able to find good correspondence between the example images. In addition, we present the relation between these algebraic constraints and a factorization method for shape and motion estimation. As a result we propose a method for motion estimation in the special case of orthographic projection.
AIM-1591 Author[s]: Paul Viola Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects November 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1591.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1591.pdf We have developed a new Bayesian framework for visual object recognition which is based on the insight that images of objects can be modeled as a conjunction of local features. This framework can be used to both derive an object recognition algorithm and an algorithm for learning the features themselves. The overall approach, called complex feature recognition or CFR, is unique for several reasons: it is broadly applicable to a wide range of object types, it makes constructing object models easy, it is capable of identifying either the class or the identity of an object, and it is computationally efficient-- requiring time proportional to the size of the image. Instead of a single simple feature such as an edge, CFR uses a large set of complex features that are learned from experience with model objects. The response of a single complex feature contains much more class information than does a single edge. This significantly reduces the number of possible correspondences between the model and the image. In addition, CFR takes advantage of a type of image processing called 'oriented energy'. Oriented energy is used to efficiently pre-process the image to eliminate some of the difficulties associated with changes in lighting and pose.
AITR-1590 Author[s]: Andrew A. Berlin Towards Intelligent Structures: Active Control of Buckling May 1994 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1590.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1590.pdf The buckling of compressively-loaded members is one of the most important factors limiting the overall strength and stability of a structure. I have developed novel techniques for using active control to wiggle a structural element in such a way that buckling is prevented. I present the results of analysis, simulation, and experimentation to show that buckling can be prevented through computer- controlled adjustment of dynamical behavior.sI have constructed a small-scale railroad-style truss bridge that contains compressive members that actively resist buckling through the use of piezo-electric actuators. I have also constructed a prototype actively controlled column in which the control forces are applied by tendons, as well as a composite steel column that incorporates piezo-ceramic actuators that are used to counteract buckling. Active control of buckling allows this composite column to support 5.6 times more load than would otherwise be possible.sThese techniques promise to lead to intelligent physical structures that are both stronger and lighter than would otherwise be possible.
AIM-1589 Author[s]: Andrew Justin Blumberg General Purpose Parallel Computation on a DNA Substrate December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1589.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1589.pdf In this paper I describe and extend a new DNA computing paradigm introduced in Blumberg for building massively parallel machines in the DNA-computing models described by Adelman, Cai et. al., and Liu et. al. Employing only DNA operations which have been reported as successfully performed, I present an implementation of a Connection Machine, a SIMD (single-instruction multiple-data) parallel computer as an illustration of how to apply this approach to building computers in this domain (and as an implicit demonstration of PRAM equivalence). This is followed with a description of how to implement a MIMD (multiple-instruction multiple-data) parallel machine. The implementations described herein differ most from existing models in that they employ explicit communication between processing elements (and hence strands of DNA).
AIM-1588 Author[s]: Andrew Justin Blumberg Parallel Function Application on a DNA Substrate December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1588.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1588.pdf In this paper I present a new model that employs a biological (specifically DNA - based) substrate for performing computation. Specifically, I describe strategies for performing parallel function application in the DNA-computing models described by Adelman, Cai et. al., and Liu et. al. Employing only DNA operations which can presently be performed, I discuss some direct algorithms for computing a variety of useful mathematical functions on DNA, culminating in an algorithm for minimizing an arbitrary continuous function. In addition, computing genetic algorithms on a DNA substrate is briefly discussed.
AITR-1587 Author[s]: Partha Niyogi The Informational Complexity of Learning from Examples September 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1587.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1587.pdf This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problems are analyzed from the perspective of computational learning theory and certain unifying perspectives emerge.
AITR-1586 Author[s]: Andre DeHon Reconfigurable Architectures for General-Purpose Computing September 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1586.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1586.pdf General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special- purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP- space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time- switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register- file building blocks interconnected via a byte- wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.
AITR-1585 Author[s]: Miguel Hall Prototype of a Configurable Web-Based Assessment System June 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1585.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1585.pdf The MIT Prototype Educational Assessment System provides subjects and courses at MIT with the ability to perform online assessment. The system includes polices to handle harassment and electronic "flaming" while protecting privacy. Within these frameworks, individual courses and subjects can make their own policy decisions about such matters as to when assessments can occur, who can submit assessments, and how anonymous assessments are. By allowing assessment to take place continually and allowing both students and staff to participate, the system can provide a forum for the online discussion of subjects. Even in the case of scheduled assessments, the system can provide advantages over end-of-term assessment, since the scheduled assessments can occur several times during the semester, allowing subjects to identify and adjust those areas that could use improvement. Subjects can also develop customized questionnaires, perhaps in response to previous assessments, to suit their needs.
AIM-1584 Author[s]: Ujjaval Y. Desai, Marcelo M. Mizuki, Ichiro Masaki and Berthold K.P. Horn Edge and Mean Based Image Compression November 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1584.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1584.pdf In this paper, we present a static image compression algorithm for very low bit rate applications. The algorithm reduces spatial redundancy present in images by extracting and encoding edge and mean information. Since the human visual system is highly sensitive to edges, an edge-based compression scheme can produce intelligible images at high compression ratios. We present good quality results for facial as well as textured, 256~x~256 color images at 0.1 to 0.3 bpp. The algorithm described in this paper was designed for high performance, keeping hardware implementation issues in mind. In the next phase of the project, which is currently underway, this algorithm will be implemented in hardware, and new edge- based color image sequence compression algorithms will be developed to achieve compression ratios of over 100, i.e., less than 0.12 bpp from 12 bpp. Potential applications include low power, portable video telephones.
AIM-1583 CBCL-139 Author[s]: Michael J. Jones and Tomaso Poggio Model-Based Matching by Linear Combinations of Prototypes December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1583.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1583.pdf We describe a method for modeling object classes (such as faces) using 2D example images and an algorithm for matching a model to a novel image. The object class models are "learned'' from example images that we call prototypes. In addition to the images, the pixelwise correspondences between a reference prototype and each of the other prototypes must also be provided. Thus a model consists of a linear combination of prototypical shapes and textures. A stochastic gradient descent algorithm is used to match a model to a novel image by minimizing the error between the model and the novel image. Example models are shown as well as example matches to novel images. The robustness of the matching algorithm is also evaluated. The technique can be used for a number of applications including the computation of correspondence between novel images of a certain known class, object recognition, image synthesis and image compression.
AITR-1582 Author[s]: Ann L. Torres Virtual Model Control of a Hexapod Walking Robot December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1582.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1582.pdf Since robots are typically designed with an individual actuator at each joint, the control of these systems is often difficult and non- intuitive. This thesis explains a more intuitive control scheme called Virtual Model Control. This thesis also demonstrates the simplicity and ease of this control method by using it to control a simulated walking hexapod. Virtual Model Control uses imagined mechanical components to create virtual forces, which are applied through the joint torques of real actuators. This method produces a straightforward means of controlling joint torques to produce a desired robot behavior. Due to the intuitive nature of this control scheme, the design of a virtual model controller is similar to the design of a controller with basic mechanical components. The ease of this control scheme facilitates the use of a high level control system which can be used above the low level virtual model controllers to modulate the parameters of the imaginary mechanical components. In order to apply Virtual Model Control to parallel mechanisms, a solution to the force distribution problem is required. This thesis uses an extension of Gardner`s Partitioned Force Control method which allows for the specification of constrained degrees of freedom. This virtual model control technique was applied to a simulated hexapod robot. Although the hexapod is a highly non-linear, parallel mechanism, the virtual models allowed text-book control solutions to be used while the robot was walking. Using a simple linear control law, the robot walked while simultaneously balancing a pendulum and tracking an object.
AITR-1581 Author[s]: Jerry E. Pratt Virtual Model Control of a Biped Walking Robot December 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1581.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1581.pdf The transformation from high level task specification to low level motion control is a fundamental issue in sensorimotor control in animals and robots. This thesis develops a control scheme called virtual model control which addresses this issue. Virtual model control is a motion control language which uses simulations of imagined mechanical components to create forces, which are applied through joint torques, thereby creating the illusion that the components are connected to the robot. Due to the intuitive nature of this technique, designing a virtual model controller requires the same skills as designing the mechanism itself. A high level control system can be cascaded with the low level virtual model controller to modulate the parameters of the virtual mechanisms. Discrete commands from the high level controller would then result in fluid motion. An extension of Gardner's Partitioned Actuator Set Control method is developed. This method allows for the specification of constraints on the generalized forces which each serial path of a parallel mechanism can apply. Virtual model control has been applied to a bipedal walking robot. A simple algorithm utilizing a simple set of virtual components has successfully compelled the robot to walk eight consecutive steps.
AIM-1580 CBCL-138 Author[s]: Bruno A. Olshausen Learning Linear, Sparse, Factorial Codes December 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1580.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1580.pdf In previous work (Olshausen & Field 1996), an algorithm was described for learning linear sparse codes which, when trained on natural images, produces a set of basis functions that are spatially localized, oriented, and bandpass (i.e., wavelet-like). This note shows how the algorithm may be interpreted within a maximum-likelihood framework. Several useful insights emerge from this connection: it makes explicit the relation to statistical independence (i.e., factorial coding), it shows a formal relationship to the algorithm of Bell and Sejnowski (1995), and it suggests how to adapt parameters that were previously fixed.
AITR-1579 Author[s]: Brian A. LaMacchia Internet Fish August 1, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1579.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1579.pdf I have invented "Internet Fish," a novel class of resource-discovery tools designed to help users extract useful information from the Internet. Internet Fish (IFish) are semi- autonomous, persistent information brokers; users deploy individual IFish to gather and refine information related to a particular topic. An IFish will initiate research, continue to discover new sources of information, and keep tabs on new developments in that topic. As part of the information-gathering process the user interacts with his IFish to find out what it has learned, answer questions it has posed, and make suggestions for guidance. Internet Fish differ from other Internet resource discovery systems in that they are persistent, personal and dynamic. As part of the information-gathering process IFish conduct extended, long-term conversations with users as they explore. They incorporate deep structural knowledge of the organization and services of the net, and are also capable of on-the-fly reconfiguration, modification and expansion. Human users may dynamically change the IFish in response to changes in the environment, or IFish may initiate such changes itself. IFish maintain internal state, including models of its own structure, behavior, information environment and its user; these models permit an IFish to perform meta-level reasoning about its own structure. To facilitate rapid assembly of particular IFish I have created the Internet Fish Construction Kit. This system provides enabling technology for the entire class of Internet Fish tools; it facilitates both creation of new IFish as well as additions of new capabilities to existing ones. The Construction Kit includes a collection of encapsulated heuristic knowledge modules that may be combined in mix-and-match fashion to create a particular IFish; interfaces to new services written with the Construction Kit may be immediately added to "live" IFish. Using the Construction Kit I have created a demonstration IFish specialized for finding World-Wide Web documents related to a given group of documents. This "Finder" IFish includes heuristics that describe how to interact with the Web in general, explain how to take advantage of various public indexes and classification schemes, and provide a method for discovering similarity relationships among documents.
AITR-1577 Author[s]: Ignacio Sean McQuirk An Analog VLSI Chip for Estimating the Focus of Expansion August 21, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1577.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1577.pdf For applications involving the control of moving vehicles, the recovery of relative motion between a camera and its environment is of high utility. This thesis describes the design and testing of a real- time analog VLSI chip which estimates the focus of expansion (FOE) from measured time-varying images. Our approach assumes a camera moving through a fixed world with translational velocity; the FOE is the projection of the translation vector onto the image plane. This location is the point towards which the camera is moving, and other points appear to be expanding outward from. By way of the camera imaging parameters, the location of the FOE gives the direction of 3-D translation. The algorithm we use for estimating the FOE minimizes the sum of squares of the differences at every pixel between the observed time variation of brightness and the predicted variation given the assumed position of the FOE. This minimization is not straightforward, because the relationship between the brightness derivatives depends on the unknown distance to the surface being imaged. However, image points where brightness is instantaneously constant play a critical role. Ideally, the FOE would be at the intersection of the tangents to the iso- brightness contours at these "stationary" points. In practice, brightness derivatives are hard to estimate accurately given that the image is quite noisy. Reliable results can nevertheless be obtained if the image contains many stationary points and the point is found that minimizes the sum of squares of the perpendicular distances from the tangents at the stationary points. The FOE chip calculates the gradient of this least-squares minimization sum, and the estimation is performed by closing a feedback loop around it. The chip has been implemented using an embedded CCD imager for image acquisition and a row-parallel processing scheme. A 64 x 64 version was fabricated in a 2um CCD/ BiCMOS process through MOSIS with a design goal of 200 mW of on-chip power, a top frame rate of 1000 frames/second, and a basic accuracy of 5%. A complete experimental system which estimates the FOE in real time using real motion and image scenes is demonstrated.
AIM-1576 Author[s]: Olin Shivers Supporting Dynamic Languages on the Java Virtual Machine April 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1576.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1576.pdf In this note, I propose two extensions to the Java virtual machine (or VM) to allow dynamic languages such as Dylan, Scheme and Smalltalk to be efficiently implemented on the VM. These extensions do not affect the performance of pure Java programs on the machine. The first extension allows for efficient encoding of dynamic data; the second allows for efficient encoding of language-specific computational elements.
AIM-1575 Author[s]: Kenneth Yip and Gerald Jay Sussman A Computational Model for the Acquisition and Use of Phonological Knowledge March 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1575.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1575.pdf Does knowledge of language consist of symbolic rules? How do children learn and use their linguistic knowledge? To elucidate these questions, we present a computational model that acquires phonological knowledge from a corpus of common English nouns and verbs. In our model the phonological knowledge is encapsulated as boolean constraints operating on classical linguistic representations of speech sounds in term of distinctive features. The learning algorithm compiles a corpus of words into increasingly sophisticated constraints. The algorithm is incremental, greedy, and fast. It yields one-shot learning of phonological constraints from a few examples. Our system exhibits behavior similar to that of young children learning phonological knowledge. As a bonus the constraints can be interpreted as classical linguistic rules. The computational model can be implemented by a surprisingly simple hardware mechanism. Our mechanism also sheds light on a fundamental AI question: How are signals related to symbols?
AITR-1574 Author[s]: David Beymer Pose-Invariant Face Recognition Using Real and Virtual Views March 28, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1574.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1574.pdf The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view- based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.
AITR-1573 Author[s]: Thomas F. Stahovich SketchIT: A Sketch Interpretation Tool for Conceptual Mechanical Design March 13, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1573.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1573.pdf We describe a program called SketchIT capable of producing multiple families of designs from a single sketch. The program is given a rough sketch (drawn using line segments for part faces and icons for springs and kinematic joints) and a description of the desired behavior. The sketch is "rough" in the sense that taken literally, it may not work. From this single, perhaps flawed sketch and the behavior description, the program produces an entire family of working designs. The program also produces design variants, each of which is itself a family of designs. SketchIT represents each family of designs with a "behavior ensuring parametric model" (BEP-Model), a parametric model augmented with a set of constraints that ensure the geometry provides the desired behavior. The construction of the BEP-Model from the sketch and behavior description is the primary task and source of difficulty in this undertaking. SketchIT begins by abstracting the sketch to produce a qualitative configuration space (qc- space) which it then uses as its primary representation of behavior. SketchIT modifies this initial qc-space until qualitative simulation verifies that it produces the desired behavior. SketchIT's task is then to find geometries that implement this qc-space. It does this using a library of qc-space fragments. Each fragment is a piece of parametric geometry with a set of constraints that ensure the geometry implements a specific kind of boundary (qcs- curve) in qc-space. SketchIT assembles the fragments to produce the BEP-Model. SketchIT produces design variants by mapping the qc-space to multiple implementations, and by transforming rotating parts to translating parts and vice versa.
AITR-1572 Author[s]: Kah-Kay Sung Learning and Example Selection for Object and Pattern Detection March 13, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1572.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1572.pdf This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems.
AIM-1571 Author[s]: Tommi S. Jaakkola and Michael I. Jordan Computing Upper and Lower Bounds on Likelihoods in Intractable Networks March 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1571.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1571.pdf We present techniques for computing upper and lower bounds on the likelihoods of partial instantiations of variables in sigmoid and noisy-OR networks. The bounds determine confidence intervals for the desired likelihoods and become useful when the size of the network (or clique size) precludes exact computations. We illustrate the tightness of the obtained bounds by numerical experiments.
AIM-1570 Author[s]: Lawrence K. Saul, Tommi Jaakkola and Michael I. Jordan Mean Field Theory for Sigmoid Belief Networks August 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1570.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1570.pdf We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition -- the classification of handwritten digits.
AITR-1569 Author[s]: Deniz Yuret From Genetic Algorithms to Efficient Organization May 1994 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1569.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1569.pdf The work described in this thesis began as an inquiry into the nature and use of optimization programs based on "genetic algorithms." That inquiry led, eventually, to three powerful heuristics that are broadly applicable in gradient-ascent programs: First, remember the locations of local maxima and restart the optimization program at a place distant from previously located local maxima. Second, adjust the size of probing steps to suit the local nature of the terrain, shrinking when probes do poorly and growing when probes do well. And third, keep track of the directions of recent successes, so as to probe preferentially in the direction of most rapid ascent. These algorithms lie at the core of a novel optimization program that illustrates the power to be had from deploying them together. The efficacy of this program is demonstrated on several test problems selected from a variety of fields, including De Jong's famous test-problem suite, the traveling salesman problem, the problem of coordinate registration for image guided surgery, the energy minimization problem for determining the shape of organic molecules, and the problem of assessing the structure of sedimentary deposits using seismic data.
AIM-1568 Author[s]: Philip N. Sabes and Michael I. Jordan Reinforcement Learning by Probability Matching Jaunary 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1568.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1568.pdf We present a new algorithm for associative reinforcement learning. The algorithm is based upon the idea of matching a network's output probability with a probability distribution derived from the environment"s reward signal. This Probability Matching algorithm is shown to perform faster and be less susceptible to local minima than previously existing algorithms. We use Probability Matching to train mixture of experts networks, an architecture for which other reinforcement learning rules fail to converge reliably on even simple problems. This architecture is particularly well suited for our algorithm as it can compute arbitrarily complex functions yet calculation of the output probability is simple.
AIM-1567 Author[s]: Marina Meila and Michael I. Jordan Learning Fine Motion by Markov Mixtures of Experts November 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1567.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1567.pdf Compliant control is a standard method for performing fine manipulation tasks, like grasping and assembly, but it requires estimation of the state of contact between the robot arm and the objects involved. Here we present a method to learn a model of the movement from measured data. The method requires little or no prior knowledge and the resulting model explicitly estimates the state of contact. The current state of contact is viewed as the hidden state variable of a discrete HMM. The control dependent transition probabilities between states are modeled as parametrized functions of the measurement We show that their parameters can be estimated from measurements concurrently with the estimation of the parameters of the movement in each state of contact. The learning algorithm is a variant of the EM procedure. The E step is computed exactly; solving the M step exactly would require solving a set of coupled nonlinear algebraic equations in the parameters. Instead, gradient ascent is used to produce an increase in likelihood.
AITR-1566 Author[s]: Tina Kapur Segmentation of Brain Tissue from Magnetic Resonance Images January 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1566.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1566.pdf Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. Brain tissue is a particularly complex structure, and its segmentation is an important step for studies in temporal change detection of morphology, as well as for 3D visualization in surgical planning. In this paper, we present a method for segmentation of brain tissue from magnetic resonance images that is a combination of three existing techniques from the Computer Vision literature: EM segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation in a way that the resultant method is more robust than its components. Finally, we present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 20 brain scans each with 256x256x124 voxels and validate those against segmentations generated by neuroanatomy experts.
AIM-1565 CBCL-132 Author[s]: Padhraic Smyth, David Heckerman and Michael Jordan Probabilistic Independence Networks for Hidden Markov Probability Models March 13, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1565.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1565.pdf Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well- known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.
AIM-1564 Author[s]: Jonathan A. Rees A Security Kernel Based on the Lambda-Calculus March 13, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1564.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1564.pdf Cooperation between independent agents depends upon establishing adegree of security. Each of the cooperating agents needs assurance that the cooperation will not endanger resources of value to that agent. In a computer system, a computational mechanism can assure safe cooperation among the system's users by mediating resource access according to desired security policy. Such a mechanism, which is called a security kernel, lies at the heart of many operating systems and programming environments.The report describes Scheme 48, a programming environment whose design is guided by established principles of operating system security. Scheme 48's security kernel is small, consisting of the call- by-value $lambda$-calculus with a few simple extensions to support abstract data types, object mutation, and access to hardware resources. Each agent (user or subsystem) has a separate evaluation environment that holds objects representing privileges granted to that agent. Because environments ultimately determine availability of object references, protection and sharing can be controlled largely by the way in which environments are constructed. I will describe experience with Scheme 48 that shows how it serves as a robust and flexible experimental platform. Two successful applications of Scheme 48 are the programming environment for the Cornell mobile robots, where Scheme 48 runs with no (other) operating system support; and a secure multi- user environment that runs on workstations.
AITR-1563 Author[s]: John Bryant Morrell Parallel Coupled Micro-Macro Actuators January 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1563.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1563.pdf This thesis presents a new actuator system consisting of a micro-actuator and a macro- actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. In this system, the micro-actuator is capable of high bandwidth force control due to its low mass and direct-drive connection to the output shaft. The compliant transmission of the macro-actuator reduces the impedance (stiffness) at the output shaft and increases the dynamic range of force. Performance improvement over single actuator systems was expected in force control, impedance control, force distortion and reduction of transient impact forces. A set of quantitative measures is proposed and the actuator system is evaluated against them: Force Control Bandwidth, Position Bandwidth, Dynamic Range, Impact Force, Impedance ("Backdriveability'"), Force Distortion and Force Performance Space. Several theoretical performance limits are derived from the saturation limits of the system. A control law is proposed and control system performance is compared to the theoretical limits. A prototype testbed was built using permanenent magnet motors and an experimental comparison was performed between this actuator concept and two single actuator systems. The following performance was observed: Force bandwidth of 56Hz, Torque Dynamic Range of 800:1, Peak Torque of 1040mNm, Minimum Torque of 1.3mNm. Peak Impact Force was reduced by an order of magnitude. Distortion at small amplitudes was reduced substantially. Backdriven impedance was reduced by 2-3 orders of magnitude. This actuator system shows promise for manipulator design as well as psychophysical tests of human performance.
AIM-1562 CBCL-131 Author[s]: Michael I. Jordan and Christopher M. Bishop Neural Networks March 13, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1562.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1562.pdf We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
AIM-1561 CBCL-130 Author[s]: Zoubin Ghahramani and Michael I. Jordan Factorial Hidden Markov Models February 9, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1561.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1561.pdf We present a framework for learning in hidden Markov models with distributed state representations. Within this framework, we derive a learning algorithm based on the Expectation--Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm.
AIM-1560 CBCL-129 Author[s]: Tommi S. Jaakkola, Lawrence K. Saul and Michael I. Jordan Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks February 9, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1560.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1560.pdf Sigmoid type belief networks, a class of probabilistic neural networks, provide a natural framework for compactly representing probabilistic information in a variety of unsupervised and supervised learning problems. Often the parameters used in these networks need to be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EM-algorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. We introduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains. The complementary networks can be used for continuous density estimation as well.
AIM-1559 CBCL-128 Author[s]: Michael J. Jones, Tomaso Poggio Model-Based Matching of Line Drawings by Linear Combinations of Prototypes January 18, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1559.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1559.pdf We describe a technique for finding pixelwise correspondences between two images by using models of objects of the same class to guide the search. The object models are 'learned' from example images (also called prototypes) of an object class. The models consist of a linear combination ofsprototypes. The flow fields giving pixelwise correspondences between a base prototype and each of the other prototypes must be given. A novel image of an object of the same class is matched to a model by minimizing an error between the novel image and the current guess for the closest modelsimage. Currently, the algorithm applies to line drawings of objects. An extension to real grey level images is discussed.
AIM-1558 CBCL-129 Author[s]: Carl de Marcken The Unsupervised Acquisition of a Lexicon from Continuous Speech January 18, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1558.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1558.pdf We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.
AIM-1556 CBCL-127 Author[s]: David C. Somers, Emanuel V. Todorov, Athanassios G. Siapas and Mriganka Sur Vector-Based Integration of Local and Long-Range Information in Visual Cortex January 18, 1996 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1556.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1556.pdf Integration of inputs by cortical neurons provides the basis for the complex information processing performed in the cerebral cortex. Here, we propose a new analytic framework for understanding integration within cortical neuronal receptive fields. Based on the synaptic organization of cortex, we argue that neuronal integration is a systems--level process better studied in terms of local cortical circuitry than at the level of single neurons, and we present a method for constructing self-contained modules which capture (nonlinear) local circuit interactions. In this framework, receptive field elements naturally have dual (rather than the traditional unitary influence since they drive both excitatory and inhibitory cortical neurons. This vector-based analysis, in contrast to scalarsapproaches, greatly simplifies integration by permitting linear summation of inputs from both "classical" and "extraclassical" receptive field regions. We illustrate this by explaining two complex visual cortical phenomena, which are incompatible with scalar notions of neuronal integration.
AIM-1555 Author[s]: Thomas Marill The Three-Dimensional Interpretation of a Class of Simple Line-Drawings October 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1555.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1555.pdf We provide a theory of the three-dimensional interpretation of a class of line-drawings called p-images, which are interpreted by the human vision system as parallelepipeds ("boxes"). Despite their simplicity, p-images raise a number of interesting vision questions: *Why are p-images seen as three-dimensional objects? Why not just as flatimages? *What are the dimensions and pose of the perceived objects? *Why are some p-images interpreted as rectangular boxes, while others are seen as skewed, even though there is no obvious distinction between the images? *When p-images are rotated in three dimensions, why are the image-sequences perceived as distorting objects---even though structure-from-motion would predict that rigid objects would be seen? *Why are some three-dimensional parallelepipeds seen as radically different when viewed from different viewpoints? We show that these and related questions can be answered with the help of a single mathematical result and an associated perceptual principle. An interesting special case arises when there are right angles in the p-image. This case represents a singularity in the equations and is mystifying from the vision point of view. It would seem that (at least in this case) the vision system does not follow the ordinary rules of geometry but operates in accordance with other (and as yet unknown) principles.
AIM-1554 Author[s]: D.A. Leopold, J.C. Fitzgibbons and N.K. Logothetis The Role of Attention in Binocular Rivalry as Revealed Through Optokinetic Nystagmus November 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1554.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1554.pdf When stimuli presented to the two eyes differ considerably, stable binocular fusion fails, and the subjective percept alternates between the two monocular images, a phenomenon known as binocular rivalry. The influence of attention over this perceptual switching has long been studied, and although there is evidence that attention can affect the alternation rate, its role in the overall dynamics of the rivalry process remains unclear. The present study investigated the relationship between the attention paid to the rivalry stimulus, and the dynamics of the perceptual alternations. Specifically, the temporal course of binocular rivalry was studied as the subjects performed difficult nonvisual and visual concurrent tasks, directing their attention away from the rivalry stimulus. Periods of complete perceptual dominance were compared for the attended condition, where the subjects reported perceptual changes, and the unattended condition, where one of the simultaneous tasks was performed. During both the attended and unattended conditions, phases of rivalry dominance were obtained by analyzing the subject"s optokinetic nystagmus recorded by an electrooculogram, where the polarity of the nystagmus served as an objective indicator of the perceived direction of motion. In all cases, the presence of a difficult concurrent task had little or no effect on the statistics of the alternations, as judged by two classic tests of rivalry, although the overall alternation rate showed a small but significant increase with the concurrent task. It is concluded that the statistical patterns of rivalry alternations are not governed by attentional shifts or decision-making on the part of the subject.
AIM-1553 Author[s]: N.K. Logothetis and D.A. Leopold On the Physiology of Bistable Percepts November 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1553.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1553.pdf Binocular rivalry refers to the alternating perceptions experienced when two dissimilar patterns are stereoscopically viewed. To study the neural mechanism that underlies such competitive interactions, single cells were recorded in the visual areas V1, V2, and V4, while monkeys reported the perceived orientation of rivaling sinusoidal grating patterns. A number of neurons in all areas showed alternating periods of excitation and inhibition that correlated with the perceptual dominance and suppression of the cell"s preferred orientation. The remaining population of cells were not influenced by whether or not the optimal stimulus orientation was perceptually suppressed. Response modulation during rivalry was not correlated with cell attributes such as monocularity, binocularity, or disparity tuning. These results suggest that the awareness of a visual pattern during binocular rivalry arises through interactions between neurons at different levels of visual pathways, and that the site of suppression is unlikely to correspond to a particular visual area, as often hypothesized on the basis of psychophysical observations. The cell-types of modulating neurons and their overwhelming preponderance in higher rather than in early visual areas also suggests -- together with earlier psychophysical evidence -- the possibility of a common mechanism underlying rivalry as well as other bistable percepts, such as those experienced with ambiguous figures.
AIM-1552 Author[s]: David A. Cohn Minimizing Statistical Bias with Queries September 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1552.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1552.pdf I describe an exploration criterion that attempts to minimize the error of a learner by minimizing its estimated squared bias. I describe experiments with locally-weighted regression on two simple kinematics problems, and observe that this "bias-only" approach outperforms the more common "variance-only" exploration approach, even in the presence of noise.
AIM-1551 Author[s]: Jacob Katzenelson and Aharon Unikovski A Network Charge-Orineted MOS Transistor Model August 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1551.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1551.pdf The MOS transistor physical model as described in [3] is presented here as a network model. The goal is to obtain an accurate model, suitable for simulation, free from certain problems reported in the literature [13], and conceptually as simple as possible. To achieve this goal the original model had to be extended and modified. The paper presents the derivation of the network model from physical equations, including the corrections which are required for simulation and which compensate for simplifications introduced in the original physical model. Our intrinsic MOS model consists of three nonlinear voltage-controlled capacitors and a dependent current source. The charges of the capacitors and the current of the current source are functions of the voltages $V_{gs}$, $V_{bs}$, and $V_{ds}$. The complete model consists of the intrinsic model plus the parasitics. The apparent simplicity of the model is a result of hiding information in the characteristics of the nonlinear components. The resulted network model has been checked by simulation and analysis. It is shown that the network model is suitable for simulation: It is defined for any value of the voltages; the functions involved are continuous and satisfy Lipschitz conditions with no jumps at region boundaries; Derivatives have been computed symbolically and are available for use by the Newton-Raphson method. The model"s functions can be measured from the terminals. It is also shown that small channel effects can be included in the model. Higher frequency effects can be modeled by using a network consisting of several sections of the basic lumped model. Future plans include a detailed comparison of the network model with models such as SPICE level 3 and a comparison of the multi- section higher frequency model with experiments.
AIM-1550 Author[s]: T.D. Alter and Ronen Basri Extracting Salient Curves from Images: An Analysis of the Saliency Network August 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1550.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1550.pdf The Saliency Network proposed by Shashua and Ullman is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally prefers long and smooth curves over short or wiggly ones. While computing saliencies, the network also fills in gaps with smooth completions and tolerates noise. Finally, the network is locally connected, and its size is proportional to the size of the image. Nevertheless, our analysis reveals certain weaknesses with the method. In particular, we show cases in which the most salient element does not lie on the perceptually most salient curve. Furthermore, in some cases the saliency measure changes its preferences when curves are scaled uniformly. Also, we show that for certain fragmented curves the measure prefers large gaps over a few small gaps of the same total size. In addition, we analyze the time complexity required by the method. We show that the number of steps required for convergence in serial implementations is quadratic in the size of the network, and in parallel implementations is linear in the size of the network. We discuss problems due to coarse sampling of the range of possible orientations. We show that with proper sampling the complexity of the network becomes cubic in the size of the network. Finally, we consider the possibility of using the Saliency Network for grouping. We show that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than the most salient one.
AIM-1549 Author[s]: Roberto Brunelli and Tomaso Poggio Template Matching: Matched Spatial Filters and Beyond October 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1549.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1549.pdf Template matching by means of cross-correlation is common practice in pattern recognition. However, its sensitivity to deformations of the pattern and the broad and unsharp peaks it produces are significant drawbacks. This paper reviews some results on how these shortcomings can be removed. Several techniques (Matched Spatial Filters, Synthetic Discriminant Functions, Principal Components Projections and Reconstruction Residuals) are reviewed and compared on a common task: locating eyes in a database of faces. New variants are also proposed and compared: least squares Discriminant Functions and the combined use of projections on eigenfunctions and the corresponding reconstruction residuals. Finally, approximation networks are introduced in an attempt to improve filter design by the introduction of nonlinearity.
AITR-1548 Author[s]: Paul A. Viola Alignment by Maximization of Manual Information March 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1548.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1548.pdf A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images with computed tomography (CT) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image called EMMA. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation. Finally, we will describe a number of additional real- world applications that can be solved efficiently and reliably using EMMA. EMMA can be used in machine learning to find maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images (MRI).
AIM-1547 Author[s]: Michael R. Blair, Natalya Cohen, David M. LaMacchia and Brian K. Zuzga MIT SchMUSE: Class-Based Remote Delegation in a Capricious Distributed Environment February 1993 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1547.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1547.pdf MIT SchMUSE (pronounced "shmooz") is a concurrent, distributed, delegation-based object-oriented interactive environment with persistent storage. It is designed to run in a "capricious" network environment, where servers can migrate from site to site and can regularly become unavailable. Our design introduces a new form of unique identifiers called "globally unique tickets" that provide globally unique time/space stamps for objects and classes without being location specific. Object location is achieved by a distributed hierarchical lazy lookup mechanism that we call "realm resolution." We also introduce a novel mechanism called "message deferral" for enhanced reliability in the face of remote delegation. We conclude with a comparison to related work and a projection of future work on MIT SchMUSE.
AITR-1546 Author[s]: Yoky Matsuoka Embodiment and Manipulation Learning Process for a Humanoid Hand May 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1546.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1546.pdf Babies are born with simple manipulation capabilities such as reflexes to perceived stimuli. Initial discoveries by babies are accidental until they become coordinated and curious enough to actively investigate their surroundings. This thesis explores the development of such primitive learning systems using an embodied light-weight hand with three fingers and a thumb. It is self- contained having four motors and 36 exteroceptor and proprioceptor sensors controlled by an on-palm microcontroller. Primitive manipulation is learned from sensory inputs using competitive learning, back-propagation algorithm and reinforcement learning strategies. This hand will be used for a humanoid being developed at the MIT Artificial Intelligence Laboratory.
AITR-1545 Author[s]: James A. Stuart Fiske Thread Scheduling Mechanisms for Multiple-Context Parallel Processors June 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1545.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1545.pdf Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
AITR-1544 Author[s]: J.P. Mellor Enhanced Reality Visualization in a Surgical Environment January 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1544.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1544.pdf Enhanced reality visualization is the process of enhancing an image by adding to it information which is not present in the original image. A wide variety of information can be added to an image ranging from hidden lines or surfaces to textual or iconic data about a particular part of the image. Enhanced reality visualization is particularly well suited to neurosurgery. By rendering brain structures which are not visible, at the correct location in an image of a patient's head, the surgeon is essentially provided with X-ray vision. He can visualize the spatial relationship between brain structures before he performs a craniotomy and during the surgery he can see what's under the next layer before he cuts through. Given a video image of the patient and a three dimensional model of the patient's brain the problem enhanced reality visualization faces is to render the model from the correct viewpoint and overlay it on the original image. The relationship between the coordinate frames of the patient, the patient's internal anatomy scans and the image plane of the camera observing the patient must be established. This problem is closely related to the camera calibration problem. This report presents a new approach to finding this relationship and develops a system for performing enhanced reality visualization in a surgical environment. Immediately prior to surgery a few circular fiducials are placed near the surgical site. An initial registration of video and internal data is performed using a laser scanner. Following this, our method is fully automatic, runs in nearly real-time, is accurate to within a pixel, allows both patient and camera motion, automatically corrects for changes to the internal camera parameters (focal length, focus, aperture, etc.) and requires only a single image.
AITR-1543 Author[s]: Brian Scott Eberman Contact Sensing: A Sequential Decision Approach to Sensing Manipulation Contact May 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1543.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1543.pdf This paper describes a new statistical, model-based approach to building a contact state observer. The observer uses measurements of the contact force and position, and prior information about the task encoded in a graph, to determine the current location of the robot in the task configuration space. Each node represents what the measurements will look like in a small region of configuration space by storing a predictive, statistical, measurement model. This approach assumes that the measurements are statistically block independent conditioned on knowledge of the model, which is a fairly good model of the actual process. Arcs in the graph represent possible transitions between models. Beam Viterbi search is used to match measurement history against possible paths through the model graph in order to estimate the most likely path for the robot. The resulting approach provides a new decision process that can be use as an observer for event driven manipulation programming. The decision procedure is significantly more robust than simple threshold decisions because the measurement history is used to make decisions. The approach can be used to enhance the capabilities of autonomous assembly machines and in quality control applications.
AIM-1542 Author[s]: D. McAllester, P. Van Henlenryck and T. Kapur Three Cuts for Accelerated Interval Propagation May 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1542.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1542.pdf This paper addresses the problem of nonlinear multivariate root finding. In an earlier paper we described a system called Newton which finds roots of systems of nonlinear equations using refinements of interval methods. The refinements are inspired by AI constraint propagation techniques. Newton is competative with continuation methods on most benchmarks and can handle a variety of cases that are infeasible for continuation methods. This paper presents three "cuts" which we believe capture the essential theoretical ideas behind the success of Newton. This paper describes the cuts in a concise and abstract manner which, we believe, makes the theoretical content of our work more apparent. Any implementation will need to adopt some heuristic control mechanism. Heuristic control of the cuts is only briefly discussed here.
AITR-1541 Author[s]: Elmer S. Hung Parameter Estimation in Chaotic Systems April 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1541.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1541.pdf This report examines how to estimate the parameters of a chaotic system given noisy observations of the state behavior of the system. Investigating parameter estimation for chaotic systems is interesting because of possible applications for high-precision measurement and for use in other signal processing, communication, and control applications involving chaotic systems. In this report, we examine theoretical issues regarding parameter estimation in chaotic systems and develop an efficient algorithm to perform parameter estimation. We discover two properties that are helpful for performing parameter estimation on non-structurally stable systems. First, it turns out that most data in a time series of state observations contribute very little information about the underlying parameters of a system, while a few sections of data may be extraordinarily sensitive to parameter changes. Second, for one-parameter families of systems, we demonstrate that there is often a preferred direction in parameter space governing how easily trajectories of one system can "shadow'" trajectories of nearby systems. This asymmetry of shadowing behavior in parameter space is proved for certain families of maps of the interval. Numerical evidence indicates that similar results may be true for a wide variety of other systems. Using the two properties cited above, we devise an algorithm for performing parameter estimation. Standard parameter estimation techniques such as the extended Kalman filter perform poorly on chaotic systems because of divergence problems. The proposed algorithm achieves accuracies several orders of magnitude better than the Kalman filter and has good convergence properties for large data sets.
AIM-1537 Author[s]: David Beymer Vectorizing Face Images by Interpreting Shape and Texture Computations September 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1537.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1537.pdf The correspondence problem in computer vision is basically a matching task between two or more sets of features. In this paper, we introduce a vectorized image representation, which is a feature-based representation where correspondence has been established with respect to a reference image. This representation has two components: (1) shape, or (x, y) feature locations, and (2) texture, defined as the image grey levels mapped onto the standard reference image. This paper explores an automatic technique for "vectorizing" face images. Our face vectorizer alternates back and forth between computation steps for shape and texture, and a key idea is to structure the two computations so that each one uses the output of the other. A hierarchical coarse-to-fine implementation is discussed, and applications are presented to the problems of facial feature detection and registration of two arbitrary faces.
AIM-1536 Author[s]: David Beymer and Tomaso Poggio Face Recognition from One Example View September 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1536.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1536.pdf If we are provided a face database with only one example view per person, is it possible to recognize new views of them under a variety of different poses, especially views rotated in depth from the original example view? We investigate using prior knowledge about faces plus each single example view to generate virtual views of each person, or views of the face as seen from different poses. Prior knowledge of faces is represented in an example-based way, using 2D views of a prototype face seen rotating in depth. The synthesized virtual views are evaluated as example views in a view-based approach to pose-invariant face recognition. They are shown to improve the recognition rate over the scenario where only the single real view is used.
AIM-1535 Author[s]: Panayotis Skordos and Gerald Jay Sussman Comparison Between Subsonic Flow Simulation and Physical Measurements of Flue Pipes April 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1535.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1535.pdf Direct simulations of wind musical instruments using the compressible Navier Stokes equations have recently become possible through the use of parallel computing and through developments in numerical methods. As a first demonstration, the flow of air and the generation of musical tones inside a soprano recorder are simulated numerically. In addition, physical measurements are made of the acoustic signal generated by the recorder at different blowing speeds. The comparison between simulated and physically measured behavior is encouraging and points towards ways of improving the simulations.
AIM-1534 Author[s]: Panayotis A. Skordos Aeroacoustics on Non-Dedicated Workstations April 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1534.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1534.pdf The simulation of subsonic aeroacoustic problems such as the flow-generated sound of wind instruments is well suited for parallel computing on a cluster of non-dedicated workstations. Simulations are demonstrated which employ 20 non-dedicated Hewlett-Packard workstations (HP9000/715), and achieve comparable performance on this problem as a 64-node CM-5 dedicated supercomputer with vector units. The success of the present approach depends on the low communication requirements of the problem (low communication to computation ratio) which arise from the coarse-grain decomposition of the problem and the use of local-interaction methods. Many important problems may be suitable for this type of parallel computing including computer vision, circuit simulation, and other subsonic flow problems.
AIM-1533 Author[s]: N.K. Logothetis, J. Pauls and T. Poggio Spatial Reference Frames for Object Recognition: Tuning for Rotations in Depth March 1995 ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1533.ps ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1533.pdf The inferior temporal cortex (IT) of monkeys is thought to play an essential role in visual object recognition. Inferotemporal neurons are known to respond to complex visual stimuli, including patterns like faces, hands, or other body parts. What is the role of such neurons in object recognition? The present study examines this question in combined psychophysical and electrophysiological experiments, in which monkeys learned to classify and recognize novel visual 3D objects. A population of neurons in IT were found to respond selectively to such objects that the monkeys had recently learned to recognize. A large majority of these cells discharged maximally for one view of the object, while their response fell off gradually as the object was rotated away from the neuron"s preferred view. Most neurons exhibited orientation-dependent responses also during view-plane rotations. Some neurons were found tuned around two views of the same object, while a very small number of cells responded in a view- invariant manner. For five different objects that were extensively used during the training of the animals, and for which behavioral performance became view-independent, multiple cells were found that were tuned around different views of the same object. No selective responses were ever encountered for views that the animal systematically failed to recognize. The results of our experiments suggest that neurons in this area can develop a complex receptive field organization as a consequence of extensive training in the discrimination and recognition of objects. Simple geometric features did not appear to account for the neurons" selective responses. These findings support the id |