CSAIL Digital Archive  Artificial Intelligence
Laboratory Series
AIM1600 CBCL143 Author[s]: Thomas Vetter, Michael J. Jones and Tomaso Poggio A Bootstrapping Algorithm for Learning Linear Models of Object Classes
ftp://publications.ai.mit.edu/aipublications/15001999/AIM1600.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1600.pdf Flexible models of object classes, based on linear combinations of prototypical images, are capable of matching novel images of the same class and have been shown to be a powerful tool to solve several fundamental vision tasks such as recognition, synthesis and correspondence. The key problem in creating a specific flexible model is the computation of pixelwise correspondence between the prototypes, a task done until now in a semiautomatic way. In this paper we describe an algorithm that automatically bootstraps the correspondence between the prototypes. The algorithm  which can be used for 2D images as well as for 3D models  is shown to synthesize successfully a flexible model of frontal face images and a flexible model of handwritten digits.
AIM1602 CBCL144 Author[s]: Edgar Osuna, Robert Freund and Federico Girosi Support Vector Machines: Training and Applications March 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1602.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1602.pdf The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard nonlinear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of subproblems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a secondorder variant of the Reduced Gradient Method as the solver of the sub problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.
AIM1603 CBCL145 Author[s]: Shai Avidan, Theodoros Evgeniou, Amnon Shashua and Tomaso Poggio ImageBased View Synthesis January 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1603.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1603.pdf We present a new method for rendering novel images of flexible 3D objects from a small number of example images in correspondence. The strength of the method is the ability to synthesize images whose viewing position is significantly far away from the viewing cone of the example images ("view extrapolation"), yet without ever modeling the 3D structure of the scene. The method relies on synthesizing a chain of "trilinear tensors" that governs the warping function from the example images to the novel image, together with a multidimensional interpolation function that synthesizes the nonrigid motions of the viewed object from the virtual camera position. We show that two closely spaced example images alone are sufficient in practice to synthesize a significant viewing cone, thus demonstrating the ability of representing an object by a relatively small number of model images  for the purpose of cheap and fast viewers that can run on standard hardware.
AIM1604 Author[s]: William J. Dally, Leonard McMillan, Gary Bishop and Henry Fuchs The Delta Tree: An ObjectCentered Approach to ImageBased Rendering May 2, 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1604.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1604.pdf This paper introduces the delta tree, a data structure that represents an object using a set of reference images. It also describes an algorithm for generating arbitrary re projections of an object by traversing its delta tree. Delta trees are an efficient representation in terms of both storage and rendering performance. Each node of a delta tree stores an image taken from a point on a sampling sphere that encloses the object. Each image is compressed by discarding pixels that can be reconstructed by warping its ancestor's images to the node's viewpoint. The partial image stored at each node is divided into blocks and represented in the frequency domain. The rendering process generates an image at an arbitrary viewpoint by traversing the delta tree from a root node to one or more of its leaves. A subdivision algorithm selects only the required blocks from the nodes along the path. For each block, only the frequency components necessary to reconstruct the final image at an appropriate sampling density are used. This frequency selection mechanism handles both antialiasing and levelofdetail within a single framework. A complex scene is initially rendered by compositing images generated by traversing the delta trees of its components. Once the reference views of a scene are rendered once in this manner, the entire scene can be reprojected to an arbitrary viewpoint by traversing its own delta tree. Our approach is limited to generating views of an object from outside the object's convex hull. In practice we work around this problem by subdividing objects to render views from within the convex hull.
AIM1605 CBCL146 Author[s]: Marina Meila and Michael I. Jordan Triangulation by Continuous Embedding March 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1605.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1605.pdf When triangulating a belief network we aim to obtain a junction tree of minimum state space. Searching for the optimal triangulation can be cast as a search over all the permutations of the network's vaeriables. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. In this paper we introduce an upper bound to the total junction tree weight as the cost function. The appropriatedness of this choice is discussed and explored by simulations. Then we present two ways of embedding the new objective function into continuous domains and show that they perform well compared to the best known heuristic.
AIM1606 CBCL147 Author[s]: Federico Girosi An Equivalence Between Sparse Approximation and Support Vector Machines May 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1606.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1606.pdf In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit DeNoising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.
AIM1608 CBCL148 Author[s]: Gad Geiger and Jerome Y. Lettvin A View on Dyslexia June 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1608.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1608.pdf We describe here, briefly, a perceptual non reading measure which reliably distinguishes between dyslexic persons and ordinary readers. More importantly, we describe a regimen of practice with which dyslexics learn a new perceptual strategy for reading. Two controlled experiment on dyslexics children demonstrate the regimen's efficiency.
AIM1610 CBCL150 Author[s]: Marcus Dill and Shimon Edelman Translation Invariance in Object Recognition, and Its Relation to Other Visual Transformations June 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1610.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1610.pdf Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of shortterm memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.
AIM1611 CBCL151 Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris Estimating Dependency Structure as a Hidden Variable June 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1611.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1611.pdf This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EMand the Minimum Spanning Tree algorithm to find the ML and MAP mixtureof trees for a variety of priors, including the Dirichlet and the MDL priors.
AIM1612 CBCL152 Author[s]: Massimiliano Pontil and Alessandro Verri Properties of Support Vector Machines August 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1612.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1612.pdf Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.
AIM1613 CBCL153 Author[s]: Zhaoping Li Visual Segmentation without Classification in a Model of the Primary Visual Cortex August 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1613.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1613.pdf Stimuli outside classical receptive fields significantly influence the neurons' activities in primary visual cortex. We propose that such contextual influences are used to segment regions by detecting the breakdown of homogeneity or translation invariance in the input, thus computing global region boundaries using local interactions. This is implemented in a biologically based model of V1, and demonstrated in examples of texture segmentation and figureground segregation. By contrast with traditional approaches, segmentation occurs without classification or comparison of features within or between regions and is performed by exactly the same neural circuit responsible for the dual problem of the grouping and enhancement of contours.
AIM1614 Author[s]: Daniel Coore, Radhika Nagpal and Ron Weiss Paradigms for Structure in an Amorphous Computer October 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1614.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1614.pdf Recent developments in microfabrication and nanotechnology will enable the inexpensive manufacturing of massive numbers of tiny computing elements with sensors and actuators. New programming paradigms are required for obtaining organized and coherent behavior from the cooperation of large numbers of unreliable processing elements that are interconnected in unknown, irregular, and possibly timevarying ways. Amorphous computing is the study of developing and programming such ultrascale computing environments. This paper presents an approach to programming an amorphous computer by spontaneously organizing an unstructured collection of processing elements into cooperative groups and hierarchies. This paper introduces a structure called an AC Hierarchy, which logically organizes processors into groups at different levels of granularity. The AC hierarchy simplifies programming of an amorphous computer through new language abstractions, facilitates the design of efficient and robust algorithms, and simplifies the analysis of their performance. Several example applications are presented that greatly benefit from the AC hierarchy. This paper introduces three algorithms for constructing multiple levels of the hierarchy from an unstructured collection of processors.
AIM1615 CBCL154 Author[s]: Shimon Edelman and Sharon DuvdevaniBar Visual Recognition and Categorization on the Basis of Similarities to Multiple Class Prototypes September 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1615.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1615.pdf To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the openended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.
AIM1616 CBCL155 Author[s]: Yair Weiss Belief Propagation and Revision in Networks with Loops November 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1616.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1616.pdf Local belief propagation rules of the sort proposed by Pearl(1988) are guaranteed to converge to the optimal beliefs for singly connected networks. Recently, a number of researchers have empirically demonstrated good performance of these same algorithms on networks with loops, but a theoretical understanding of this performance has yet to be achieved. Here we lay the foundation for an understanding of belief propagation in networks with loops. For networks with a single loop, we derive ananalytical relationship between the steady state beliefs in the loopy network and the true posterior probability. Using this relationship we show a category of networks for which the MAP estimate obtained by belief update and by belief revision can be proven to be optimal (although the beliefs will be incorrect). We show how nodes can use local information in the messages they receive in order to correct the steady state beliefs. Furthermore we prove that for all networks with a single loop, the MAP estimate obtained by belief revisionat convergence is guaranteed to give the globally optimal sequence of states. The result is independent of the length of the cycle and the size of the statespace. For networks with multiple loops, we introduce the concept of a "balanced network" and show simulati.
AITR1617 Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan Proceedings of the Second PHANToM User's Group Workshop December 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1617.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1617.pdf On October 1922, 1997 the Second PHANToM Users Group Workshop was held at the MIT Endicott House in Dedham, Massachusetts. Designed as a forum for sharing results and insights, the workshop was attended by more than 60 participants from 7 countries. These proceedings report on workshop presentations in diverse areas including rigid and compliant rendering, tool kits, development environments, techniques for scientific data visualization, multimodal issues and a programming tutorial.
AIM1618 Author[s]: Dan Halperin and Christian R. Shelton A Perturbation Scheme for Spherical Arrangements with Application to Molecular Modeling December 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1618.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1618.pdf We describe a software package for computing and manipulating the subdivision of a sphere by a collection of (not necessarily great) circles and for computing the boundary surface of the union of spheres. We present problems that arise in the implementation of the software and the solutions that we have found for them. At the core of the paper is a novel perturbation scheme to overcome degeneracies and precision problems in computing spherical arrangements while using floating point arithmetic. The scheme is relatively simple, it balances between the efficiency of computation and the magnitude of the perturbation, and it performs well in practice. In one O(n) time pass through the data, it perturbs the inputs necessary to insure no potential degeneracies and then passes the perturbed inputs on to the geometric algorithm. We report and discuss experimental results. Our package is a major component in a larger package aimed to support geometric queries on molecular models; it is currently employed by chemists working in "rational drug design." The spherical subdivisions are used to construct a geometric model of a molecule where each sphere represents an atom. We also give an overview of the molecular modeling package and detail additional features and implementation issues.
AIM1619 CBCL156 Author[s]: Theodoros Evgeniou and Tomaso Poggio Sparse Representations of Multiple Signals September 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1619.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1619.pdf We discuss the problem of finding sparse representations of a class of signals. We formalize the problem and prove it is NPcomplete both in the case of a single signal and that of multiple ones. Next we develop a simple approximation method to the problem and we show experimental results using artificially generated signals. Furthermore,we use our approximation method to find sparse representations of classes of real signals, specifically of images of pedestrians. We discuss the relation between our formulation of the sparsity problem and the problem of finding representations of objects that are compact and appropriate for detection and classification.
AIM1620 CBCL157 Author[s]: Gideon P. Stein and Amnon Shashua On Degeneracy of Linear Reconstruction from Three Views: Linear Line Complex and Applications December 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1620.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1620.pdf This paper investigates the linear degeneracies of projective structure estimation from point and line features across three views. We show that the rank of the linear system of equations for recovering the trilinear tensor of three views reduces to 23 (instead of 26) in the case when the scene is a Linear Line Complex (set of lines in space intersecting at a common line) and is 21 when the scene is planar. The LLC situation is only linearly degenerate, and we show that one can obtain a unique solution when the admissibility constraints of the tensor are accounted for. The line configuration described by an LLC, rather than being some obscure case, is in fact quite typical. It includes, as a particular example, the case of a camera moving down a hallway in an office environment or down an urban street. Furthermore, an LLC situation may occur as an artifact such as in direct estimation from spatiotemporal derivatives of image brightness. Therefore, an investigation into degeneracies and their remedy is important also in practice.
AIM1621 Author[s]: Gideon P. Stein and Amnon Shashua Direct Estimation of Motion and Extended Scene Structure from a Moving Stereo Rig December 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1621.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1621.pdf We describe a new method for motion estimation and 3D reconstruction from stereo image sequences obtained by a stereo rig moving through a rigid world. We show that given two stereo pairs one can compute the motion of the stereo rig directly from the image derivatives (spatial and temporal). Correspondences are not required. One can then use the images from both pairs combined to compute a dense depth map. The motion estimates between stereo pairs enable us to combine depth maps from all the pairs in the sequence to form an extended scene reconstruction and we show results from a real image sequence. The motion computation is a linear least squares computation using all the pixels in the image. Areas with little or no contrast are implicitly weighted less so one does not have to explicitly apply a confidence measure.
AIM1624 CBCL158 Author[s]: Yar Weiss and Edward H. Adelson Slow and Smooth: A Bayesian Theory for the Combination of Local Motion Signals in Human Vision February 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1624.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1624.pdf In order to estimate the motion of an object, the visual system needs to combine multiple local measurements, each of which carries some degree of ambiguity. We present a model of motion perception whereby measurements from different image regions are combined according to a Bayesian estimator  the estimated motion maximizes the posterior probability assuming a prior favoring slow and smooth velocities. In reviewing a large number of previously published phenomena we find that the Bayesian estimator predicts a wide range of psychophysical results. This suggests that the seemingly complex set of illusions arise from a single computational strategy that is optimal under reasonable assumptions.
AIM1625 CBCL159 Author[s]: Thomas Hofmann and Jan Puzicha Statistical Models for Cooccurrence Data February 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1625.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1625.pdf Modeling and predicting cooccurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing cooccurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EMbased optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective cooccurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
AIM1626 Author[s]: Radhika Nagpal and Daniel Coore An Algorithm for Group Formation and Maximal Independent Set in an Amorphous Computer February 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1626.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1626.pdf Amorphous computing is the study of programming ultrascale computing environments of smart sensors and actuators cite{whitepaper}. The individual elements are identical, asynchronous, randomly placed, embedded and communicate locally via wireless broadcast. Aggregating the processors into groups is a useful paradigm for programming an amorphous computer because groups can be used for specialization, increased robustness, and efficient resource allocation. This paper presents a new algorithm, called the clubs algorithm, for efficiently aggregating processors into groups in an amorphous computer, in time proportional to the local density of processors. The clubs algorithm is wellsuited to the unique characteristics of an amorphous computer. In addition, the algorithm derives two properties from the physical embedding of the amorphous computer: an upper bound on the number of groups formed and a constant upper bound on the density of groups. The clubs algorithm can also be extended to find the maximal independent set (MIS) and $Delta + 1$ vertex coloring in an amorphous computer in $O(log N)$ rounds, where $N$ is the total number of elements and $Delta$ is the maximum degree.
AITR1627 Author[s]: Alan Bawden Implementing Distributed Systems Using Linear Naming March 1993 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1627.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1627.pdf Linear graph reduction is a simple computational model in which the cost of naming things is explicitly represented. The key idea is the notion of "linearity". A name is linear if it is only used once, so with linear naming you cannot create more than one outstanding reference to an entity. As a result, linear naming is cheap to support and easy to reason about. Programs can be translated into the linear graph reduction model such that linear names in the program are implemented directly as linear names in the model. Nonlinear names are supported by constructing them out of linear names. The translation thus exposes those places where the program uses names in expensive, nonlinear ways. Two applications demonstrate the utility of using linear graph reduction: First, in the area of distributed computing, linear naming makes it easy to support cheap crossnetwork references and highly portable data structures, Linear naming also facilitates demand driven migration of tasks and data around the network without requiring explicit guidance from the programmer. Second, linear graph reduction reveals a new characterization of the phenomenon of state. Systems in which state appears are those which depend on certain  global system properties. State is not a localizable phenomenon, which suggests that our usual object oriented metaphor for state is flawed.
AIM1628 Author[s]: Brian Scassellati A Binocular, Foveated Active Vision System March 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1628.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1628.pdf This report documents the design and implementation of a binocular, foveated active vision system as part of the Cog project at the MIT Artificial Intelligence Laboratory. The active vision system features a three degree of freedom mechanical platform that supports four color cameras, a motion control system, and a parallel network of digital signal processors for image processing. To demonstrate the capabilities of the system, we present results from four sample visualmotor tasks.
AIM1629 CBCL160 Author[s]: Maximilian Riesenhuber and Tomaso Poggio Modeling Invariances in Inferotemporal Cell Tuning March 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1629.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1629.pdf In macaque inferotemporal cortex (IT), neurons have been found to respond selectively to complex shapes while showing broad tuning ("invariance") with respect to stimulus transformations such as translation and scale changes and a limited tuning to rotation in depth. Training monkeys with novel, papercliplike objects, Logothetis et al. could investigate whether these invariance properties are due to experience with exhaustively many transformed instances of an object or if there are mechanisms that allow the cells to show response invariance also to previously unseen instances of that object. They found objectselective cells in anterior IT which exhibited limited invariance to various transformations after training with single object views. While previous models accounted for the tuning of the cells for rotations in depth and for their selectivity to a specific object relative to a population of distractor objects, the model described here attempts to explain in a biologically plausible way the additional properties of translation and size invariance. Using the same stimuli as in the experiment, we find that model IT neurons exhibit invariance properties which closely parallel those of real neurons. Simulations show that the model is capable of unsupervised learning of viewtuned neurons. The model also allows to make experimentally testable predictions regarding novel stimulus transformations and combinations of stimuli.
AIM1630 Author[s]: Thomas Marill Recovery of ThreeDimensional Objects from Single Perspective Images March 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1630.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1630.pdf Any threedimensional wireframe object constructed out of parallelograms can be recovered from a single perspective twodimensional image. A procedure for performing the recovery is given.
AIM1631 Author[s]: Kevin K. Lin CoordinateIndependent Computations on Differential Equations March 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1631.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1631.pdf This project investigates the computational representation of differentiable manifolds, with the primary goal of solving partial differential equations using multiple coordinate systems on general n dimensional spaces. In the process, this abstraction is used to perform accurate integrations of ordinary differential equations using multiple coordinate systems. In the case of linear partial differential equations, however, unexpected difficulties arise even with the simplest equations.
AIM1632 CBCL161 Author[s]: Tomaso Poggio and Federico Girosi Notes on PCA, Regularization, Sparsity and Support Vector Machines May 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1632.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1632.pdf We derive a new representation for a function as a linear combination of local correlation kernels at optimal sparse locations and discuss its relation to PCA, regularization, sparsity principles and Support Vector Machines. We first review previous results for the approximation of a function from discrete data (Girosi, 1998) in the context of Vapnik"s feature space and dual representation (Vapnik, 1995). We apply them to show 1) that a standard regularization functional with a stabilizer defined in terms of the correlation function induces a regression function in the span of the feature space of classical Principal Components and 2) that there exist a dual representations of the regression function in terms of a regularization network with a kernel equal to a generalized correlation function. We then describe the main observation of the paper: the dual representation in terms of the correlation function can be sparsified using the Support Vector Machines (Vapnik, 1982) technique and this operation is equivalent to sparsify a large dictionary of basis functions adapted to the task, using a variation of Basis Pursuit DeNoising (Chen, Donoho and Saunders, 1995; see also related work by Donahue and Geiger, 1994; Olshausen and Field, 1995; Lewicki and Sejnowski, 1998). In addition to extending the close relations between regularization, Support Vector Machines and sparsity, our work also illuminates and formalizes the LFA concept of Penev and Atick (1996). We discuss the relation between our results, which are about regression, and the different problem of pattern classification.
AIM1633 Author[s]: Kenneth Yip and Gerald Jay Sussman Sparse Representations for Fast, OneShot Learning November 1997 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1633.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1633.pdf Humans rapidly and reliably learn many kinds of regularities and generalizations. We propose a novel model of fast learning that exploits the properties of sparse representations and the constraints imposed by a plausible hardware mechanism. To demonstrate our approach we describe a computational model of acquisition in the domain of morphophonology. We encapsulate phonological information as bidirectional boolean constraint relations operating on the classical linguistic representations of speech sounds in term of distinctive features. The performance model is described as a hardware mechanism that incrementally enforces the constraints. Phonological behavior arises from the action of this mechanism. Constraints are induced from a corpus of common English nouns and verbs. The induction algorithm compiles the corpus into increasingly sophisticated constraints. The algorithm yields oneshot learning from a few examples. Our model has been implemented as a computer program. The program exhibits phonological behavior similar to that of young children. As a bonus the constraints that are acquired can be interpreted as classical linguistic rules.
AITR1634 Author[s]: Anita M. Flynn Piezoelectric Ultrasonic Micromotors June 1995 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1634.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1634.pdf This report describes development of micro fabricated piezoelectric ultrasonic motors and bulkceramic piezoelectric ultrasonic motors. Ultrasonic motors offer the advantage of low speed, high torque operation without the need for gears. They can be made compact and lightweight and provide a holding torque in the absence of applied power, due to the traveling wave frictional coupling mechanism between the rotor and the stator. This report covers modeling, simulation, fabrication and testing of ultrasonic motors. Design of experiments methods were also utilized to find optimal motor parameters. A suite of 8 mm diameter x 3 mm tall motors were machined for these studies and maximum stall torques as large as 10^( 3) Nm, maximum noload speeds of 1710 rpm and peak power outputs of 27 mW were realized. Aditionally, this report describes the implementation of a microfabricated ultrasonic motor using thin film lead zirconate titanate. In a joint project with the Pennsylvania State University Materials Research Laboratory and MIT Lincoln Laboratory, 2 mm and 5 mm diameter stator structures were fabricated on 1 micron thick silicon nitride membranes. Small glass lenses placed down on top spun at 100300 rpm with 4 V excitation at 90 kHz. The large power densities and stall torques of these piezoelectric ultrasonic motors offer tremendous promis for integrated machines: complete intelligent, electromechanical autonomous systems massproduced in a single fabrication process.
AIM1635 CBCL162 Author[s]: Constantine P. Papgeorgiou, Federico Girosi and Tomaso Poggio Sparse Correlation Kernel Analysis and Reconstruction May 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1635.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1635.pdf This paper presents a new paradigm for signal reconstruction and superresolution, Correlation Kernel Analysis (CKA), that is based on the selection of a sparse set of bases from a large dictionary of class specific basis functions. The basis functions that we use are the correlation functions of the class of signals we are analyzing. To choose the appropriate features from this large dictionary, we use Support Vector Machine (SVM) regression and compare this to traditional Principal Component Analysis (PCA) for the tasks of signal reconstruction, superresolution, and compression. The testbed we use in this paper is a set of images of pedestrians. This paper also presents results of experiments in which we use a dictionary of multiscale basis functions and then use Basis Pursuit DeNoising to obtain a sparse, multiscale approximation of a signal. The results are analyzed and we conclude that 1) when used with a sparse representation technique, the correlation function is an effective kernel for image reconstruction and superresolution, 2) for image compression, PCA and SVM have different tradeoffs, depending on the particular metric that is used to evaluate the results, 3) in sparse representation techniques, L_1 is not a good proxy for the true measure of sparsity, L_0, and 4) the L_epsilon norm may be a better error metric for image reconstruction and compression than the L_2 norm, though the exact psychophysical metric should take into account high order structure in images.
AIM1636 Author[s]: Charles Isbell and Paul Viola Restructuring Sparse High Dimensional Data for Effective Retrieval May 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1636.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1636.pdf The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vectora measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highlyrelated words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.
AIM1637 Author[s]: GinaAnne Levow CorpusBased Techniques for Word Sense Disambiguation May 27, 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1637.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1637.pdf The need for robust and easily extensible systems for word sense disambiguation coupled with successes in training systems for a variety of tasks using large online corpora has led to extensive research into corpusbased statistical approaches to this problem. Promising results have been achieved by vector space representations of context, clustering combined with a semantic knowledge base, and decision lists based on collocational relations. We evaluate these techniques with respect to three important criteria: how their definition of context affects their ability to incorporate different types of disambiguating information, how they define similarity among senses, and how easily they can generalize to new senses. The strengths and weaknesses of these systems provide guidance for future systems which must capture and model a variety of disambiguating information, both syntactic and semantic.
AIM1638 Author[s]: Oded Maron and Tomas LozanoPerez Visible Decomposition: RealTime Path Planning in Large Planar Environments June, 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1638.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1638.pdf We describe a method called Visible Decomposition for computing collisionfree paths in real time through a planar environment with a large number of obstacles. This method divides space into local visibility graphs, ensuring that all operations are local. The search time is kept low since the number of regions is proved to be small. We analyze the computational demands of the algorithm and the quality of the paths it produces. In addition, we show test results on a large simulation testbed.
AITR1639 Author[s]: Oded Maron Learning from Ambiguity December 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1639.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1639.pdf There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as Multiple Instance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multiple instance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful examples of applying Diverse Density techniques.
AIM1640 CBCL163 Author[s]: Zhaoping Li PreAttentive Segmentation in the Primary Visual Cortex June 30, 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1640.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1640.pdf Stimuli outside classical receptive fields have been shown to exert significant influence over the activities of neurons in primary visual cortexWe propose that contextual influences are used for preattentive visual segmentation, in a new framework called segmentation without classification. This means that segmentation of an image into regions occurs without classification of features within a region or comparison of features between regions. This segmentation framework is simpler than previous computational approaches, making it implementable by V1 mechanisms, though higher leve l visual mechanisms are needed to refine its output. However, it easily handles a class of segmentation problems that are tricky in conventional methods. The cortex computes global region boundaries by detecting the breakdown of homogeneity or translation invariance in the input, using local intracortical interactions mediated by the horizontal connections. The difference between contextual influences near and far from region boundaries makes neural activities near region boundaries higher than elsewhere, making boundaries more salient for perceptual popout. This proposal is implemented in a biologically based model of V1, and demonstrated using examples of texture segmentation and figureground segregation. The model performs segmentation in exactly the same neural circuit that solves the dual problem of the enhancement of contours, as is suggested by experimental observations. Its behavior is compared with psychophysical and physiological data on segmentation, contour enhancement, and contextual influences. We discuss the implications of segmentation without classification and the predictions of our V1 model, and relate it to other phenomena such as asymmetry in visual search.
AIM1642 Author[s]: Parry Husbands, Charles Lee Isbell, Jr. and Alan Edelman Interactive Supercomputing with MIT Matlab July 28, 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1642.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1642.pdf This paper describes MITMatlab, a system that enables users of supercomputers or networked PCs to work on large data sets within Matlab transparently. MITMatlab is based on the Parallel Problems Server (PPServer), a standalone 'linear algebra server' that provides a mechanism for running distributed memory algorithms on large data sets. The PPServer and MITMatlab enable highperformance interactive supercomputing. With such a tool, researchers can now use Matlab as more than a prototyping tool for experimenting with small problems. Instead, MITMatlab makes is possible to visualize and operate interactively on large data sets. This has implications not only in supercomputing, but for Artificial Intelligence applicatons such as Machine Learning, Information Retrieval and Image Processing.
AITR1645 Author[s]: Deborah A. Wallach A Hierarchical Cache Coherent Protocol September 1992 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1645.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1645.pdf As the number of processors in distributed memory multiprocessors grows, efficiently supporting a sharedmemory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow sharedmemory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hotspots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test andset operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a kary n cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the intertransaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.
AIM1646 CBCL164 Author[s]: Nicholas Chan, Blake LeBaron, Andrew Lo and Tomaso Poggio Information Dissemination and Aggregation in Asset Markets with Simple Intelligent Traders September 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1646.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1646.pdf Various studies of asset markets have shown that traders are capable of learning and transmitting information through prices in many situations. In this paper we replace human traders with intelligent software agents in a series of simulated markets. Using these simple learning agents, we are able to replicate several features of the experiments with human subjects, regarding (1) dissemination of information from informed to uninformed traders, and (2) aggregation of information spread over different traders.
AIM1647 Author[s]: Hany Farid and Edward H. Adelson Separating Reflections from Images Using Independent Components Analysis September 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1647.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1647.pdf The image of an object can vary dramatically depending on lighting, specularities/reflections and shadows. It is often advantageous to separate these incidental variations from the intrinsic aspects of an image. Along these lines this paper describes a method for photographing objects behind glass and digitally removing the reflections off the glass leaving the image of the objects behind the glass intact. We describe the details of this method which employs simple optical techniques and independent components analysis (ICA) and show its efficacy with several examples.
AIM1648 CBCL165 Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris Estimating Dependency Structure as a Hidden Variable September 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1648.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1648.pdf This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification.
AIM1649 CBCL166 Author[s]: Massimiliano Pontil, Ryan Rifkin and Theodoros Evgeniou From Regression to Classification in Support Vector Machines November 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1649.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1649.pdf We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1epsilon)^{ 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.
AITR1650 CBCL167 Author[s]: Christian R. Shelton ThreeDimensional Correspondence December 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1650.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1650.pdf This paper describes the problem of threedimensional object correspondence and presents an algorithm for matching two threedimensional colored surfaces using polygon reduction and the minimization of an energy function. At the core of this algorithm is a novel datadependent multiresolution pyramid for polygonal surfaces. The algorithm is general to correspondence between any two manifolds of the same dimension embedded in a higher dimensional space. Results demonstrating correspondences between various objects are presented and a method for incorporating user input is also detailed.
AIM1651 CBCL168 Author[s]: Massimiliano Pontil, Sayan Mukherjee and Federico Girosi On the Noise Model of Support Vector Machine Regression October 1998 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1651.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1651.pdf Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$ insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify nonquadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
AIM1652 Author[s]: Marina Meila An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High Dimensional Sparse Data January 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1652.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1652.pdf Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution $P$. This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 23 orders of magnitude in the experiments.
AIM1653 CBCL170 Author[s]: Sayan Mukherjee and Vladimir Vapnik Multivariate Density Estimation: An SVM Approach April 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1653.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1653.pdf We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in $R^2$ and $R^{12}$. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.
AIM1654 CBCL171 Author[s]: Theodoros Evgeniou, Massimiliano Pontil and Tomaso Poggio A Unified Framework for Regularization Networks and Support Vector Machines March 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1654.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1654.pdf Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples  in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.
AIM1655 Author[s]: Gideon P. Stein, Raquel Romano and Lily Lee Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame April 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1655.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1655.pdf Passive monitoring of large sites typically requires coordination between multiple cameras, which in turn requires methods for automatically relating events between distributed cameras. This paper tackles the problem of selfcalibration of multiple cameras which are very far apart, using feature correspondences to determine the camera geometry. The key problem is finding such correspondences. Since the camera geometry and photometric characteristics vary considerably between images, one cannot use brightness and/or proximity constraints. Instead we apply planar geometric constraints to moving objects in the scene in order to align the scene"s ground plane across multiple views. We do not assume synchronized cameras, and we show that enforcing geometric constraints enables us to align the tracking data in time. Once we have recovered the homography which aligns the planar structure in the scene, we can compute from the homography matrix the 3D position of the plane and the relative camera positions. This in turn enables us to recover a homography matrix which maps the images to an overhead view. We demonstrate this technique in two settings: a controlled lab setting where we test the effects of errors in internal camera calibration, and an uncontrolled, outdoor setting in which the full procedure is applied to external camera calibration and ground plane recovery. In spite of noise in the internal camera parameters and image data, the system successfully recovers both planar structure and relative camera positions in both settings.
AIM1656 CBCL172 Author[s]: Theodoros Evgeniou and Massimiliano Pontil On the V(subscript gamma) Dimension for Regression in Reproducing Kernel Hilbert Spaces May 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1656.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1656.pdf This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.
AIM1657 Author[s]: Hany Farid Detecting Digital Forgeries Using Bispectral Analysis December 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1657.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1657.pdf With the rapid increase in lowcost and sophisticated digital technology the need for techniques to authenticate digital material will become more urgent. In this paper we address the problem of authenticating digital signals assuming no explicit prior knowledge of the original. The basic approach that we take is to assume that in the frequency domain a "natural" signal has weak higherorder statistical correlations. We then show that "unnatural" correlations are introduced if this signal is passed through a nonlinearity (which would almost surely occur in the creation of a forgery). Techniques from polyspectral analysis are then used to detect the presence of these correlations. We review the basics of polyspectral analysis, show how and why these tools can be used in detecting forgeries and show their effectiveness in analyzing human speech.
AIM1658 CBCL173 Author[s]: Tony Ezzat and Tomaso Poggio Visual Speech Synthesis by Morphing Visemes May 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1658.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1658.pdf We present MikeTalk, a texttoaudiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a texttospeech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
AIM1661 CBCL177 Author[s]: Ryan Rifkin, Massimiliano Pontil and Alessandro Verri A Note on Support Vector Machines Degeneracy August 11, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1661.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1661.pdf When training Support Vector Machines (SVMs) over nonseparable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.
AIM1662 Author[s]: Liana M. Lorigo, Olivier Faugeras, W.E.L. Grimson, Renaud Keriven, Ron Kikinis, CarlFredrik Westin Codimension 2 Geodesic Active Contours for MRA Segmentation August 11, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1662.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1662.pdf Automatic and semiautomatic magnetic resonance angiography (MRA)s segmentation techniques can potentially save radiologists larges amounts of time required for manual segmentation and cans facilitate further data analysis. The proposed MRAs segmentation method uses a mathematical modeling technique whichs is wellsuited to the complicated curvelike structure of bloods vessels. We define the segmentation task as ans energy minimization over all 3D curves and use a level set methods to search for a solution. Ours approach is an extension of previous level set segmentations techniques to higher codimension.
AIM1663 Author[s]: Rajesh Kasturirangan Multiple Scales in SmallWorld Networks August 11, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1663.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1663.pdf Smallworld architectures may be implicated in a range of phenomena from networks of neurons in the cerebral cortex to social networks and propogation of viruses. Small world networks are interpolations of regular and random networks that retain the advantages of both regular and random networks by being highly clustered like regular networks and having small average path length between nodes, like random networks. While most of the recent attention on small world networks has focussed on the effect of introducing disorder/randomness into a regular network, we show that that the fundamental mechanism behind the small world phenomenon is not disorder/ randomness, but the presence of connections of many different length scales. Consequently, in order to explain the smallworld phenomenon, we introduce the concept of multiple scale networks and then state the multiple length scale hypothesis. We show that smallworld behavior in randomly rewired networks is a consequence of features common to all multiple scale networks. To support the multiple length scale hypothesis, novel network architectures are introduced that need not be a result of random rewiring of a regular network. In each case it is shown that whenever the network exhibits small world behavior, it also has connections of diverse length scales. We also show that the distribution of the length scales of the new connections is significantly more important than whether the new connections are long range, medium range or short range.
AIM1664 CBCL178 Author[s]: Anuj Mohan Object Detection in Images by Components August 11, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1664.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1664.pdf In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.
AIM1665 Author[s]: Harold Abelson, Don Allen, Daniel Coore, Chris Hanson, George Homsy, Thomas F. Knight, Jr., Radhika Nagpal, Erik Rauch, Gerald Jay Sussman and Ron Weiss Amorphous Computing August 29, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1665.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1665.pdf Amorphous computing is the development of organizational principles and programming languages for obtaining coherent behaviors from the cooperation of myriads of unreliable parts that are interconnected in unknown, irregular, and timevarying ways. The impetus for amorphous computing comes from developments in microfabrication and fundamental biology, each of which is the basis of a kernel technology that makes it possible to build or grow huge numbers of almostidentical informationprocessing units at almost no cost. This paper sets out a research agenda for realizing the potential of amorphous computing and surveys some initial progress, both in programming and in fabrication. We describe some approaches to programming amorphous systems, which are inspired by metaphors from biology and physics. We also present the basic ideas of cellular computing, an approach to constructing digitallogic circuits within living cells by representing logic levels by concentrations DNAbinding proteins.
AIM1666 Author[s]: Radhika Nagpal Organizing a Global Coordinate System from Local Information on an Amorphous Computer August 29, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1666.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1666.pdf This paper demonstrates that it is possible to generate a reasonably accurate coordinate system on randomly distributed processors, using only local information and local communication. By coordinate systems we imply that each element assigns itself a logical coordinate that maps to its global physical location, starting with no apriori knowledge of position or orientation. The algorithm presented is inspired by biological systems that use chemical gradients to determine the position of cells. Extensive analysis and simulation results are presented. Two key results are: there is a critical minimum average neighborhood size of 15 for good accuracy and there is a fundamental limit on the resolution of any coordinate system determined strictly from local communication. We also demonstrate that using this algorithm, random distributions of processors produce significantly better accuracy than regular processor grids  such as those used by cellular automata. This has implications for discrete models of biology as well as for building smart sensor arrays.
AITR1668 Author[s]: Tommi Jaakkola, Marina Meila and Tony Jebara Maximum Entropy Discrimination December 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1668.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1668.pdf We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class conditional models within this framework. Preliminary experimental results are indicative of the potential in these techniques.
AIM1669 Author[s]: Kinh Tieu and Paul Viola Boosting Image Database Retrieval September 10, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1669.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1669.pdf We present an approach for image database retrieval using a very large number of highly selective features and simple online learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual "causes" and that images which are visually similar share causes. We propose a mechanism for generating a large number of complex features which capture some aspects of this causal structure. Boosting is used to learn simple and efficient classifiers in this complex feature space. Finally we will describe a practical implementation of our retrieval system on a database of 3000 images.
AIM1670 Author[s]: Mark M. Millonas and Erik M. Rauch Transmembrane Signal Transduction and Biochemical Turing Pattern Formation September 28, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1670.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1670.pdf The Turing mechanism for the production of a broken spatial symmetry in an initially homogeneous system of reacting and diffusing substances has attracted much interest as a potential model for certain aspects of morphogenesis such as pre patterning in the embryo, and has also served as a model for selforganization in more generic systems. The two features necessary for the formation of Turing patterns are short range autocatalysis and longrange inhibition which usually only occur when the diffusion rate of the inhibitor is significantly greater than that of the activator. This observation has sometimes been used to cast doubt on applicability of the Turing mechanism to cellular patterning since many messenger molecules that diffuse between cells do so at moreorless similar rates. Here we show that stationary, symmetrybreaking Turing patterns can form in physiologically realistic systems even when the extracellular diffusion coefficients are equal; the kinetic properties of the 'receiver' and 'transmitter' proteins responsible for signal transduction will be primary factors governing this process.
AIM1672 CBCL179 Author[s]: Vinay P. Kumar and Tomaso Poggio LearningBased Approach to Real Time Tracking and Analysis of Faces September 23, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1672.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1672.pdf This paper describes a trainable system capable of tracking faces and facialsfeatures like eyes and nostrils and estimating basic mouth features such as sdegrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (handcrafted) models of facial features based on optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.
AIM1673 CBCL180 Author[s]: Constantine P. Papageorgiou and Tomaso Poggio A Trainable Object Detection System: Car Detection in Static Images october 13, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1673.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1673.pdf This paper describes a general, trainable architecture for object detection that has previously been applied to face and peoplesdetection with a new application to car detection in static images. Our technique is a learning based approach that uses a set of labeled training data from which an implicit model of an object class  here, cars  is learned. Instead of pixel representations that may be noisy and therefore not provide a compact representation for learning, our training images are transformed from pixel space to that of Haar wavelets that respond to local, oriented, multiscale intensity differences. These feature vectors are then used to train a support vector machine classifier. The detection of cars in images is an important step in applications such as traffic monitoring, driver assistance systems, and surveillance, among others. We show several examples of car detection on outof sample images and show an ROC curve that highlights the performance of our system.
AITR1674 Author[s]: J.P. Mellor Automatically Recovering Geometry and Texture from Large Sets of Calibrated Images October 22, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1674.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1674.pdf Threedimensional models which contain both geometry and texture have numerous applications such as urban planning, physical simulation, and virtual environments. A major focus of computer vision (and recently graphics) research is the automatic recovery of threedimensional models from two dimensional images. After many years of research this goal is yet to be achieved. Most practical modeling systems require substantial human input and unlike automatic systems are not scalable. This thesis presents a novel method for automatically recovering dense surface patches using large sets (1000's) of calibrated images taken from arbitrary positions within the scene. Physical instruments, such as Global Positioning System (GPS), inertial sensors, and inclinometers, are used to estimate the position and orientation of each image. Essentially, the problem is to find corresponding points in each of the images. Once a correspondence has been established, calculating its threedimensional position is simply a matter of geometry. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem. The initial stage of the algorithm is completely local and scales linearly with the number of images. Subsequent stages are global in nature, exploit geometric constraints, and scale quadratically with the complexity of the underlying scene. We describe techniques for: 1) detecting and localizing surface patches; 2) refining camera calibration estimates and rejecting false positive surfels; and 3) grouping surface patches into surfaces and growing the surface along a twodimensional manifold. We also discuss a method for producing high quality, textured threedimensional models from these surfaces. Some of the most important characteristics of this approach are that it: 1) uses and refines noisy calibration estimates; 2) compensates for large variations in illumination; 3) tolerates significant soft occlusion (e.g. tree branches); and 4) associates, at a fundamental level, an estimated normal (i.e. no frontalplanar assumption) and texture with each surface patch.
AITR1675 Author[s]: J. Kenneth Salisbury, Jr. and Mandayam A. Srinivasan (editors) Proceedings of the Fourth PHANTOM Users Group Workshop November 4, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1675.ps ftp://publications.ai.mit.edu/aipublications/pdf/AITR1675.pdf This Report contains the proceedings of the Fourth Phantom Users Group Workshop contains 17 papers presented October 912, 1999 at MIT Endicott House in Dedham Massachusetts. The workshop included sessions on, Tools for Programmers, Dynamic Environments, Perception and Cognition, Haptic Connections, Collision Detection / Collision Response, Medical and Seismic Applications, and Haptics Going Mainstream. The proceedings include papers that cover a variety of subjects in computer haptics including rendering, contact determination, development libraries, and applications in medicine, path planning, data interaction and training.
AIM1679 CBCL183 Author[s]: Maximilian Riesenhuber and Tomaso Poggio A Note on Object Class Representation and Categorical Perception December 17, 1999 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1679.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1679.pdf We present a novel scheme ("Categorical Basis Functions", CBF) for object class representation in the brain and contrast it to the "Chorus of Prototypes" scheme recently proposed by Edelman. The power and flexibility of CBF is demonstrated in two examples. CBF is then applied to investigate the phenomenon of Categorical Perception, in particular the finding by Bulthoff et al. (1998) of categorization of faces by gender without corresponding Categorical Perception. Here, CBF makes predictions that can be tested in a psychophysical experiment. Finally, experiments are suggested to further test CBF.
AIM1681 CBCL184 Author[s]: Theodoros Evgeniou and Massimiliano Pontil A Note on the Generalization Performance of Kernel Classifiers with Margin May 1, 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1681.ps ftp://publications.ai.mit.edu/aipublications/pdf/AIM1681.pdf We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
AITR1685 CBCL186 Author[s]: Constantine P. Papageorgiou A Trainable System for Object Detection in Images and Video Sequences May 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AITR1685.ps ftp://publications.ai.mit.edu/aipublications/15001999/AITR1685.pdf This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between inclass and outofclass patterns. To detect objects in outofsample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways  1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a realtime version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a componentbased approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and graylevel affect performance.
AIM1696 CBCL191 Author[s]: Vinay Kumar and Tomaso Poggio LearningBased Approach to Estimation of Morphable Model Parameters September 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1696.ps ftp://publications.ai.mit.edu/aipublications/15001999/AIM1696.pdf We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluationbased compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
AIM1682 CBCL185 Author[s]: Maximilian Riesenhuber and Tomaso Poggio The Individual is Nothing, the Class Everything: Psychophysics and Modeling of Recognition in Obect Classes May, 1, 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1682.ps ftp://publications.ai.mit.edu/aipublications/15001999/AIM1682.pdf Most psychophysical studies of object recognition have focussed on the recognition and representation of individual objects subjects had previously explicitely been trained on. Correspondingly, modeling studies have often employed a 'grandmother'type representation where the objects to be recognized were represented by individual units. However, objects in the natural world are commonly members of a class containing a number of visually similar objects, such as faces, for which physiology studies have provided support for a representation based on a sparse population code, which permits generalization from the learned exemplars to novel objects of that class. In this paper, we present results from psychophysical and modeling studies intended to investigate object recognition in natural ('continuous') object classes. In two experiments, subjects were trained to perform subordinate level discrimination in a continuous object class  images of computerrendered cars  created using a 3D morphing system. By comparing the recognition performance of trained and untrained subjects we could estimate the effects of viewpointspecific training and infer properties of the object classspecific representation learned as a result of training. We then compared the experimental findings to simulations, building on our recently presented HMAX model of object recognition in cortex, to investigate the computational properties of a populationbased object class representation as outlined above. We find experimental evidence, supported by modeling results, that training builds a viewpoint and classspecific representation that supplements a preexisting representation with lower shape discriminability but possibly greater viewpoint invariance.
AIM1687 CBCL187 Author[s]: Bernd Heisele, Tomaso Poggio and Massimiliano Pontil Face Detection in Still Gray Images May 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1687.ps ftp://publications.ai.mit.edu/aipublications/15001999/AIM1687.pdf We present a trainable system for detecting frontal and nearfrontal views of faces in still gray images using Support Vector Machines (SVMs). We first consider the problem of detecting the whole face pattern by a single SVM classifer. In this context we compare different types of image features, present and evaluate a new method for reducing the number of features and discuss practical issues concerning the parameterization of SVMs and the selection of training data. The second part of the paper describes a componentbased method for face detection consisting of a twolevel hierarchy of SVM classifers. On the first level, component classifers independently detect components of a face, such as the eyes, the nose, and the mouth. On the second level, a single classifer checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face.
AIM1688 CBCL188 Author[s]: Chikahito Nakajima, Massimiliano Pontil, Bernd Heisele and Tomaso Poggio People Recognition in Image Sequences by Supervised Learning June 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1688.ps ftp://publications.ai.mit.edu/aipublications/15001999/AIM1688.pdf We describe a system that learns from examples to recognize people in images taken indoors. Images of people are represented by colorbased and shapebased features. Recognition is carried out through combinations of Support Vector Machine classifiers (SVMs). Different types of multiclass strategies based on SVMs are explored and compared to kNearest Neighbors classifiers (kNNs). The system works in real time and shows high performance rates for people recognition throughout one day.
AIM1695 CBCL190 Author[s]: Maximilian Riesenhuber and Tomaso Poggio Computational Models of Object Recognition in Cortex: A Review August 7, 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1695.ps ftp://publications.ai.mit.edu/aipublications/15001999/AIM1695.pdf Understanding how biological visual systems perform object recognition is one of the ultimate goals in computational neuroscience. Among the biological models of recognition the main distinctions are between feedforward and feedback and between objectcentered and viewcentered. From a computational viewpoint the different recognition tasks  for instance categorization and identification  are very similar, representing different tradeoffs between specificity and invariance. Thus the different tasks do not strictly require different classes of models. The focus of the review is on feedforward, viewbased models that are supported by psychophysical and physiological data.
AIM1697 CBCL192 Author[s]: Thomas Serre, Bernd Heisele, Sayan Mukherjee and Tomaso Poggio Feature Selection for Face Detection September 2000 ftp://publications.ai.mit.edu/aipublications/15001999/AIM1697.ps ftp://publications.ai.mit.edu/aipublications/15001999/AIM1697.pdf We present a new method to select features for a face detection system using Support Vector Machines (SVMs). In the first step we reduce the dimensionality of the input space by projecting the data into a subset of eigenvectors. The dimension of the subset is determined by a classification criterion based on minimizing a bound on the expected error probability of an SVM. In the second step we select features from the SVM feature space by removing those that have low contributions to the decision function of the SVM.

