CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

CSAIL Digital Archive - Artificial Intelligence Laboratory Series
Publications 1600 through 1699

AI PublicationsLast update Sun Mar 19 05:05:02 2006



Author[s]: Thomas Vetter, Michael J. Jones and Tomaso Poggio

A Bootstrapping Algorithm for Learning Linear Models of Object Classes



Flexible models of object classes, based on linear combinations of prototypical images, are capable of matching novel images of the same class and have been shown to be a powerful tool to solve several fundamental vision tasks such as recognition, synthesis and correspondence. The key problem in creating a specific flexible model is the computation of pixelwise correspondence between the prototypes, a task done until now in a semiautomatic way. In this paper we describe an algorithm that automatically bootstraps the correspondence between the prototypes. The algorithm - which can be used for 2D images as well as for 3D models - is shown to synthesize successfully a flexible model of frontal face images and a flexible model of handwritten digits.



Author[s]: Edgar Osuna, Robert Freund and Federico Girosi

Support Vector Machines: Training and Applications

March 1997



The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi- Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub- problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.



Author[s]: Shai Avidan, Theodoros Evgeniou, Amnon Shashua and Tomaso Poggio

Image-Based View Synthesis

January 1997



We present a new method for rendering novel images of flexible 3D objects from a small number of example images in correspondence. The strength of the method is the ability to synthesize images whose viewing position is significantly far away from the viewing cone of the example images ("view extrapolation"), yet without ever modeling the 3D structure of the scene. The method relies on synthesizing a chain of "trilinear tensors" that governs the warping function from the example images to the novel image, together with a multi-dimensional interpolation function that synthesizes the non-rigid motions of the viewed object from the virtual camera position. We show that two closely spaced example images alone are sufficient in practice to synthesize a significant viewing cone, thus demonstrating the ability of representing an object by a relatively small number of model images --- for the purpose of cheap and fast viewers that can run on standard hardware.


Author[s]: William J. Dally, Leonard McMillan, Gary Bishop and Henry Fuchs

The Delta Tree: An Object-Centered Approach to Image-Based Rendering

May 2, 1997



This paper introduces the delta tree, a data structure that represents an object using a set of reference images. It also describes an algorithm for generating arbitrary re- projections of an object by traversing its delta tree. Delta trees are an efficient representation in terms of both storage and rendering performance. Each node of a delta tree stores an image taken from a point on a sampling sphere that encloses the object. Each image is compressed by discarding pixels that can be reconstructed by warping its ancestor's images to the node's viewpoint. The partial image stored at each node is divided into blocks and represented in the frequency domain. The rendering process generates an image at an arbitrary viewpoint by traversing the delta tree from a root node to one or more of its leaves. A subdivision algorithm selects only the required blocks from the nodes along the path. For each block, only the frequency components necessary to reconstruct the final image at an appropriate sampling density are used. This frequency selection mechanism handles both antialiasing and level-of-detail within a single framework. A complex scene is initially rendered by compositing images generated by traversing the delta trees of its components. Once the reference views of a scene are rendered once in this manner, the entire scene can be reprojected to an arbitrary viewpoint by traversing its own delta tree. Our approach is limited to generating views of an object from outside the object's convex hull. In practice we work around this problem by subdividing objects to render views from within the convex hull.



Author[s]: Marina Meila and Michael I. Jordan

Triangulation by Continuous Embedding

March 1997



When triangulating a belief network we aim to obtain a junction tree of minimum state space. Searching for the optimal triangulation can be cast as a search over all the permutations of the network's vaeriables. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. In this paper we introduce an upper bound to the total junction tree weight as the cost function. The appropriatedness of this choice is discussed and explored by simulations. Then we present two ways of embedding the new objective function into continuous domains and show that they perform well compared to the best known heuristic.



Author[s]: Federico Girosi

An Equivalence Between Sparse Approximation and Support Vector Machines

May 1997



In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.



Author[s]: Gad Geiger and Jerome Y. Lettvin

A View on Dyslexia

June 1997



We describe here, briefly, a perceptual non- reading measure which reliably distinguishes between dyslexic persons and ordinary readers. More importantly, we describe a regimen of practice with which dyslexics learn a new perceptual strategy for reading. Two controlled experiment on dyslexics children demonstrate the regimen's efficiency.



Author[s]: Marcus Dill and Shimon Edelman

Translation Invariance in Object Recognition, and Its Relation to Other Visual Transformations

June 1997



Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.



Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris

Estimating Dependency Structure as a Hidden Variable

June 1997



This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EMand the Minimum Spanning Tree algorithm to find the ML and MAP mixtureof trees for a variety of priors, including the Dirichlet and the MDL priors.



Author[s]: Massimiliano Pontil and Alessandro Verri

Properties of Support Vector Machines

August 1997



Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.



Author[s]: Zhaoping Li

Visual Segmentation without Classification in a Model of the Primary Visual Cortex

August 1997



Stimuli outside classical receptive fields significantly influence the neurons' activities in primary visual cortex. We propose that such contextual influences are used to segment regions by detecting the breakdown of homogeneity or translation invariance in the input, thus computing global region boundaries using local interactions. This is implemented in a biologically based model of V1, and demonstrated in examples of texture segmentation and figure-ground segregation. By contrast with traditional approaches, segmentation occurs without classification or comparison of features within or between regions and is performed by exactly the same neural circuit responsible for the dual problem of the grouping and enhancement of contours.


Author[s]: Daniel Coore, Radhika Nagpal and Ron Weiss

Paradigms for Structure in an Amorphous Computer

October 1997



Recent developments in microfabrication and nanotechnology will enable the inexpensive manufacturing of massive numbers of tiny computing elements with sensors and actuators. New programming paradigms are required for obtaining organized and coherent behavior from the cooperation of large numbers of unreliable processing elements that are interconnected in unknown, irregular, and possibly time-varying ways. Amorphous computing is the study of developing and programming such ultrascale computing environments. This paper presents an approach to programming an amorphous computer by spontaneously organizing an unstructured collection of processing elements into cooperative groups and hierarchies. This paper introduces a structure called an AC Hierarchy, which logically organizes processors into groups at different levels of granularity. The AC hierarchy simplifies programming of an amorphous computer through new language abstractions, facilitates the design of efficient and robust algorithms, and simplifies the analysis of their performance. Several example applications are presented that greatly benefit from the AC hierarchy. This paper introduces three algorithms for constructing multiple levels of the hierarchy from an unstructured collection of processors.



Author[s]: Shimon Edelman and Sharon Duvdevani-Bar

Visual Recognition and Categorization on the Basis of Similarities to Multiple Class Prototypes

September 1997



To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open-ended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.



Author[s]: Yair Weiss

Belief Propagation and Revision in Networks with Loops

November 1997



Local belief propagation rules of the sort proposed by Pearl(1988) are guaranteed to converge to the optimal beliefs for singly connected networks. Recently, a number of researchers have empirically demonstrated good performance of these same algorithms on networks with loops, but a theoretical understanding of this performance has yet to be achieved. Here we lay the foundation for an understanding of belief propagation in networks with loops. For networks with a single loop, we derive ananalytical relationship between the steady state beliefs in the loopy network and the true posterior probability. Using this relationship we show a category of networks for which the MAP estimate obtained by belief update and by belief revision can be proven to be optimal (although the beliefs will be incorrect). We show how nodes can use local information in the messages they receive in order to correct the steady state beliefs. Furthermore we prove that for all networks with a single loop, the MAP estimate obtained by belief revisionat convergence is guaranteed to give the globally optimal sequence of states. The result is independent of the length of the cycle and the size of the statespace. For networks with multiple loops, we introduce the concept of a "balanced network" and show simulati.


Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan

Proceedings of the Second PHANToM User's Group Workshop

December 1997



On October 19-22, 1997 the Second PHANToM Users Group Workshop was held at the MIT Endicott House in Dedham, Massachusetts. Designed as a forum for sharing results and insights, the workshop was attended by more than 60 participants from 7 countries. These proceedings report on workshop presentations in diverse areas including rigid and compliant rendering, tool kits, development environments, techniques for scientific data visualization, multi-modal issues and a programming tutorial.


Author[s]: Dan Halperin and Christian R. Shelton

A Perturbation Scheme for Spherical Arrangements with Application to Molecular Modeling

December 1997



We describe a software package for computing and manipulating the subdivision of a sphere by a collection of (not necessarily great) circles and for computing the boundary surface of the union of spheres. We present problems that arise in the implementation of the software and the solutions that we have found for them. At the core of the paper is a novel perturbation scheme to overcome degeneracies and precision problems in computing spherical arrangements while using floating point arithmetic. The scheme is relatively simple, it balances between the efficiency of computation and the magnitude of the perturbation, and it performs well in practice. In one O(n) time pass through the data, it perturbs the inputs necessary to insure no potential degeneracies and then passes the perturbed inputs on to the geometric algorithm. We report and discuss experimental results. Our package is a major component in a larger package aimed to support geometric queries on molecular models; it is currently employed by chemists working in "rational drug design." The spherical subdivisions are used to construct a geometric model of a molecule where each sphere represents an atom. We also give an overview of the molecular modeling package and detail additional features and implementation issues.



Author[s]: Theodoros Evgeniou and Tomaso Poggio

Sparse Representations of Multiple Signals

September 1997



We discuss the problem of finding sparse representations of a class of signals. We formalize the problem and prove it is NP-complete both in the case of a single signal and that of multiple ones. Next we develop a simple approximation method to the problem and we show experimental results using artificially generated signals. Furthermore,we use our approximation method to find sparse representations of classes of real signals, specifically of images of pedestrians. We discuss the relation between our formulation of the sparsity problem and the problem of finding representations of objects that are compact and appropriate for detection and classification.



Author[s]: Gideon P. Stein and Amnon Shashua

On Degeneracy of Linear Reconstruction from Three Views: Linear Line Complex and Applications

December 1997



This paper investigates the linear degeneracies of projective structure estimation from point and line features across three views. We show that the rank of the linear system of equations for recovering the trilinear tensor of three views reduces to 23 (instead of 26) in the case when the scene is a Linear Line Complex (set of lines in space intersecting at a common line) and is 21 when the scene is planar. The LLC situation is only linearly degenerate, and we show that one can obtain a unique solution when the admissibility constraints of the tensor are accounted for. The line configuration described by an LLC, rather than being some obscure case, is in fact quite typical. It includes, as a particular example, the case of a camera moving down a hallway in an office environment or down an urban street. Furthermore, an LLC situation may occur as an artifact such as in direct estimation from spatio-temporal derivatives of image brightness. Therefore, an investigation into degeneracies and their remedy is important also in practice.


Author[s]: Gideon P. Stein and Amnon Shashua

Direct Estimation of Motion and Extended Scene Structure from a Moving Stereo Rig

December 1998



We describe a new method for motion estimation and 3D reconstruction from stereo image sequences obtained by a stereo rig moving through a rigid world. We show that given two stereo pairs one can compute the motion of the stereo rig directly from the image derivatives (spatial and temporal). Correspondences are not required. One can then use the images from both pairs combined to compute a dense depth map. The motion estimates between stereo pairs enable us to combine depth maps from all the pairs in the sequence to form an extended scene reconstruction and we show results from a real image sequence. The motion computation is a linear least squares computation using all the pixels in the image. Areas with little or no contrast are implicitly weighted less so one does not have to explicitly apply a confidence measure.



Author[s]: Yar Weiss and Edward H. Adelson

Slow and Smooth: A Bayesian Theory for the Combination of Local Motion Signals in Human Vision

February 1998



In order to estimate the motion of an object, the visual system needs to combine multiple local measurements, each of which carries some degree of ambiguity. We present a model of motion perception whereby measurements from different image regions are combined according to a Bayesian estimator --- the estimated motion maximizes the posterior probability assuming a prior favoring slow and smooth velocities. In reviewing a large number of previously published phenomena we find that the Bayesian estimator predicts a wide range of psychophysical results. This suggests that the seemingly complex set of illusions arise from a single computational strategy that is optimal under reasonable assumptions.



Author[s]: Thomas Hofmann and Jan Puzicha

Statistical Models for Co-occurrence Data

February 1998



Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.


Author[s]: Radhika Nagpal and Daniel Coore

An Algorithm for Group Formation and Maximal Independent Set in an Amorphous Computer

February 1998



Amorphous computing is the study of programming ultra-scale computing environments of smart sensors and actuators cite{white-paper}. The individual elements are identical, asynchronous, randomly placed, embedded and communicate locally via wireless broadcast. Aggregating the processors into groups is a useful paradigm for programming an amorphous computer because groups can be used for specialization, increased robustness, and efficient resource allocation. This paper presents a new algorithm, called the clubs algorithm, for efficiently aggregating processors into groups in an amorphous computer, in time proportional to the local density of processors. The clubs algorithm is well-suited to the unique characteristics of an amorphous computer. In addition, the algorithm derives two properties from the physical embedding of the amorphous computer: an upper bound on the number of groups formed and a constant upper bound on the density of groups. The clubs algorithm can also be extended to find the maximal independent set (MIS) and $Delta + 1$ vertex coloring in an amorphous computer in $O(log N)$ rounds, where $N$ is the total number of elements and $Delta$ is the maximum degree.


Author[s]: Alan Bawden

Implementing Distributed Systems Using Linear Naming

March 1993



Linear graph reduction is a simple computational model in which the cost of naming things is explicitly represented. The key idea is the notion of "linearity". A name is linear if it is only used once, so with linear naming you cannot create more than one outstanding reference to an entity. As a result, linear naming is cheap to support and easy to reason about. Programs can be translated into the linear graph reduction model such that linear names in the program are implemented directly as linear names in the model. Nonlinear names are supported by constructing them out of linear names. The translation thus exposes those places where the program uses names in expensive, nonlinear ways. Two applications demonstrate the utility of using linear graph reduction: First, in the area of distributed computing, linear naming makes it easy to support cheap cross-network references and highly portable data structures, Linear naming also facilitates demand driven migration of tasks and data around the network without requiring explicit guidance from the programmer. Second, linear graph reduction reveals a new characterization of the phenomenon of state. Systems in which state appears are those which depend on certain - global- system properties. State is not a localizable phenomenon, which suggests that our usual object oriented metaphor for state is flawed.


Author[s]: Brian Scassellati

A Binocular, Foveated Active Vision System

March 1998



This report documents the design and implementation of a binocular, foveated active vision system as part of the Cog project at the MIT Artificial Intelligence Laboratory. The active vision system features a three degree of freedom mechanical platform that supports four color cameras, a motion control system, and a parallel network of digital signal processors for image processing. To demonstrate the capabilities of the system, we present results from four sample visual-motor tasks.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

Modeling Invariances in Inferotemporal Cell Tuning

March 1998



In macaque inferotemporal cortex (IT), neurons have been found to respond selectively to complex shapes while showing broad tuning ("invariance") with respect to stimulus transformations such as translation and scale changes and a limited tuning to rotation in depth. Training monkeys with novel, paperclip-like objects, Logothetis et al. could investigate whether these invariance properties are due to experience with exhaustively many transformed instances of an object or if there are mechanisms that allow the cells to show response invariance also to previously unseen instances of that object. They found object-selective cells in anterior IT which exhibited limited invariance to various transformations after training with single object views. While previous models accounted for the tuning of the cells for rotations in depth and for their selectivity to a specific object relative to a population of distractor objects, the model described here attempts to explain in a biologically plausible way the additional properties of translation and size invariance. Using the same stimuli as in the experiment, we find that model IT neurons exhibit invariance properties which closely parallel those of real neurons. Simulations show that the model is capable of unsupervised learning of view-tuned neurons. The model also allows to make experimentally testable predictions regarding novel stimulus transformations and combinations of stimuli.


Author[s]: Thomas Marill

Recovery of Three-Dimensional Objects from Single Perspective Images

March 1998



Any three-dimensional wire-frame object constructed out of parallelograms can be recovered from a single perspective two-dimensional image. A procedure for performing the recovery is given.


Author[s]: Kevin K. Lin

Coordinate-Independent Computations on Differential Equations

March 1998



This project investigates the computational representation of differentiable manifolds, with the primary goal of solving partial differential equations using multiple coordinate systems on general n- dimensional spaces. In the process, this abstraction is used to perform accurate integrations of ordinary differential equations using multiple coordinate systems. In the case of linear partial differential equations, however, unexpected difficulties arise even with the simplest equations.



Author[s]: Tomaso Poggio and Federico Girosi

Notes on PCA, Regularization, Sparsity and Support Vector Machines

May 1998



We derive a new representation for a function as a linear combination of local correlation kernels at optimal sparse locations and discuss its relation to PCA, regularization, sparsity principles and Support Vector Machines. We first review previous results for the approximation of a function from discrete data (Girosi, 1998) in the context of Vapnik"s feature space and dual representation (Vapnik, 1995). We apply them to show 1) that a standard regularization functional with a stabilizer defined in terms of the correlation function induces a regression function in the span of the feature space of classical Principal Components and 2) that there exist a dual representations of the regression function in terms of a regularization network with a kernel equal to a generalized correlation function. We then describe the main observation of the paper: the dual representation in terms of the correlation function can be sparsified using the Support Vector Machines (Vapnik, 1982) technique and this operation is equivalent to sparsify a large dictionary of basis functions adapted to the task, using a variation of Basis Pursuit De-Noising (Chen, Donoho and Saunders, 1995; see also related work by Donahue and Geiger, 1994; Olshausen and Field, 1995; Lewicki and Sejnowski, 1998). In addition to extending the close relations between regularization, Support Vector Machines and sparsity, our work also illuminates and formalizes the LFA concept of Penev and Atick (1996). We discuss the relation between our results, which are about regression, and the different problem of pattern classification.


Author[s]: Kenneth Yip and Gerald Jay Sussman

Sparse Representations for Fast, One-Shot Learning

November 1997



Humans rapidly and reliably learn many kinds of regularities and generalizations. We propose a novel model of fast learning that exploits the properties of sparse representations and the constraints imposed by a plausible hardware mechanism. To demonstrate our approach we describe a computational model of acquisition in the domain of morphophonology. We encapsulate phonological information as bidirectional boolean constraint relations operating on the classical linguistic representations of speech sounds in term of distinctive features. The performance model is described as a hardware mechanism that incrementally enforces the constraints. Phonological behavior arises from the action of this mechanism. Constraints are induced from a corpus of common English nouns and verbs. The induction algorithm compiles the corpus into increasingly sophisticated constraints. The algorithm yields one-shot learning from a few examples. Our model has been implemented as a computer program. The program exhibits phonological behavior similar to that of young children. As a bonus the constraints that are acquired can be interpreted as classical linguistic rules.


Author[s]: Anita M. Flynn

Piezoelectric Ultrasonic Micromotors

June 1995



This report describes development of micro- fabricated piezoelectric ultrasonic motors and bulk-ceramic piezoelectric ultrasonic motors. Ultrasonic motors offer the advantage of low speed, high torque operation without the need for gears. They can be made compact and lightweight and provide a holding torque in the absence of applied power, due to the traveling wave frictional coupling mechanism between the rotor and the stator. This report covers modeling, simulation, fabrication and testing of ultrasonic motors. Design of experiments methods were also utilized to find optimal motor parameters. A suite of 8 mm diameter x 3 mm tall motors were machined for these studies and maximum stall torques as large as 10^(- 3) Nm, maximum no-load speeds of 1710 rpm and peak power outputs of 27 mW were realized. Aditionally, this report describes the implementation of a microfabricated ultrasonic motor using thin- film lead zirconate titanate. In a joint project with the Pennsylvania State University Materials Research Laboratory and MIT Lincoln Laboratory, 2 mm and 5 mm diameter stator structures were fabricated on 1 micron thick silicon nitride membranes. Small glass lenses placed down on top spun at 100-300 rpm with 4 V excitation at 90 kHz. The large power densities and stall torques of these piezoelectric ultrasonic motors offer tremendous promis for integrated machines: complete intelligent, electro-mechanical autonomous systems mass-produced in a single fabrication process.



Author[s]: Constantine P. Papgeorgiou, Federico Girosi and Tomaso Poggio

Sparse Correlation Kernel Analysis and Reconstruction

May 1998



This paper presents a new paradigm for signal reconstruction and superresolution, Correlation Kernel Analysis (CKA), that is based on the selection of a sparse set of bases from a large dictionary of class- specific basis functions. The basis functions that we use are the correlation functions of the class of signals we are analyzing. To choose the appropriate features from this large dictionary, we use Support Vector Machine (SVM) regression and compare this to traditional Principal Component Analysis (PCA) for the tasks of signal reconstruction, superresolution, and compression. The testbed we use in this paper is a set of images of pedestrians. This paper also presents results of experiments in which we use a dictionary of multiscale basis functions and then use Basis Pursuit De-Noising to obtain a sparse, multiscale approximation of a signal. The results are analyzed and we conclude that 1) when used with a sparse representation technique, the correlation function is an effective kernel for image reconstruction and superresolution, 2) for image compression, PCA and SVM have different tradeoffs, depending on the particular metric that is used to evaluate the results, 3) in sparse representation techniques, L_1 is not a good proxy for the true measure of sparsity, L_0, and 4) the L_epsilon norm may be a better error metric for image reconstruction and compression than the L_2 norm, though the exact psychophysical metric should take into account high order structure in images.


Author[s]: Charles Isbell and Paul Viola

Restructuring Sparse High Dimensional Data for Effective Retrieval

May 1998



The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.


Author[s]: Gina-Anne Levow

Corpus-Based Techniques for Word Sense Disambiguation

May 27, 1998



The need for robust and easily extensible systems for word sense disambiguation coupled with successes in training systems for a variety of tasks using large on-line corpora has led to extensive research into corpus-based statistical approaches to this problem. Promising results have been achieved by vector space representations of context, clustering combined with a semantic knowledge base, and decision lists based on collocational relations. We evaluate these techniques with respect to three important criteria: how their definition of context affects their ability to incorporate different types of disambiguating information, how they define similarity among senses, and how easily they can generalize to new senses. The strengths and weaknesses of these systems provide guidance for future systems which must capture and model a variety of disambiguating information, both syntactic and semantic.


Author[s]: Oded Maron and Tomas Lozano-Perez

Visible Decomposition: Real-Time Path Planning in Large Planar Environments

June, 1998



We describe a method called Visible Decomposition for computing collision-free paths in real time through a planar environment with a large number of obstacles. This method divides space into local visibility graphs, ensuring that all operations are local. The search time is kept low since the number of regions is proved to be small. We analyze the computational demands of the algorithm and the quality of the paths it produces. In addition, we show test results on a large simulation testbed.


Author[s]: Oded Maron

Learning from Ambiguity

December 1998



There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as Multiple- Instance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multiple- instance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful examples of applying Diverse Density techniques.



Author[s]: Zhaoping Li

Pre-Attentive Segmentation in the Primary Visual Cortex

June 30, 1998



Stimuli outside classical receptive fields have been shown to exert significant influence over the activities of neurons in primary visual cortexWe propose that contextual influences are used for pre-attentive visual segmentation, in a new framework called segmentation without classification. This means that segmentation of an image into regions occurs without classification of features within a region or comparison of features between regions. This segmentation framework is simpler than previous computational approaches, making it implementable by V1 mechanisms, though higher leve l visual mechanisms are needed to refine its output. However, it easily handles a class of segmentation problems that are tricky in conventional methods. The cortex computes global region boundaries by detecting the breakdown of homogeneity or translation invariance in the input, using local intra-cortical interactions mediated by the horizontal connections. The difference between contextual influences near and far from region boundaries makes neural activities near region boundaries higher than elsewhere, making boundaries more salient for perceptual pop-out. This proposal is implemented in a biologically based model of V1, and demonstrated using examples of texture segmentation and figure-ground segregation. The model performs segmentation in exactly the same neural circuit that solves the dual problem of the enhancement of contours, as is suggested by experimental observations. Its behavior is compared with psychophysical and physiological data on segmentation, contour enhancement, and contextual influences. We discuss the implications of segmentation without classification and the predictions of our V1 model, and relate it to other phenomena such as asymmetry in visual search.


Author[s]: Parry Husbands, Charles Lee Isbell, Jr. and Alan Edelman

Interactive Supercomputing with MIT Matlab

July 28, 1998



This paper describes MITMatlab, a system that enables users of supercomputers or networked PCs to work on large data sets within Matlab transparently. MITMatlab is based on the Parallel Problems Server (PPServer), a standalone 'linear algebra server' that provides a mechanism for running distributed memory algorithms on large data sets. The PPServer and MITMatlab enable high-performance interactive supercomputing. With such a tool, researchers can now use Matlab as more than a prototyping tool for experimenting with small problems. Instead, MITMatlab makes is possible to visualize and operate interactively on large data sets. This has implications not only in supercomputing, but for Artificial Intelligence applicatons such as Machine Learning, Information Retrieval and Image Processing.


Author[s]: Deborah A. Wallach

A Hierarchical Cache Coherent Protocol

September 1992



As the number of processors in distributed- memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared- memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n- cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.



Author[s]: Nicholas Chan, Blake LeBaron, Andrew Lo and Tomaso Poggio

Information Dissemination and Aggregation in Asset Markets with Simple Intelligent Traders

September 1998



Various studies of asset markets have shown that traders are capable of learning and transmitting information through prices in many situations. In this paper we replace human traders with intelligent software agents in a series of simulated markets. Using these simple learning agents, we are able to replicate several features of the experiments with human subjects, regarding (1) dissemination of information from informed to uninformed traders, and (2) aggregation of information spread over different traders.


Author[s]: Hany Farid and Edward H. Adelson

Separating Reflections from Images Using Independent Components Analysis

September 1998



The image of an object can vary dramatically depending on lighting, specularities/reflections and shadows. It is often advantageous to separate these incidental variations from the intrinsic aspects of an image. Along these lines this paper describes a method for photographing objects behind glass and digitally removing the reflections off the glass leaving the image of the objects behind the glass intact. We describe the details of this method which employs simple optical techniques and independent components analysis (ICA) and show its efficacy with several examples.



Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris

Estimating Dependency Structure as a Hidden Variable

September 1998



This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification.



Author[s]: Massimiliano Pontil, Ryan Rifkin and Theodoros Evgeniou

From Regression to Classification in Support Vector Machines

November 1998



We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1-epsilon)^{- 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1-epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.



Author[s]: Christian R. Shelton

Three-Dimensional Correspondence

December 1998



This paper describes the problem of three-dimensional object correspondence and presents an algorithm for matching two three-dimensional colored surfaces using polygon reduction and the minimization of an energy function. At the core of this algorithm is a novel data-dependent multi-resolution pyramid for polygonal surfaces. The algorithm is general to correspondence between any two manifolds of the same dimension embedded in a higher dimensional space. Results demonstrating correspondences between various objects are presented and a method for incorporating user input is also detailed.



Author[s]: Massimiliano Pontil, Sayan Mukherjee and Federico Girosi

On the Noise Model of Support Vector Machine Regression

October 1998



Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.


Author[s]: Marina Meila

An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High Dimensional Sparse Data

January 1999



Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution $P$. This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 2-3 orders of magnitude in the experiments.



Author[s]: Sayan Mukherjee and Vladimir Vapnik

Multivariate Density Estimation: An SVM Approach

April 1999



We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in $R^2$ and $R^{12}$. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.



Author[s]: Theodoros Evgeniou, Massimiliano Pontil and Tomaso Poggio

A Unified Framework for Regularization Networks and Support Vector Machines

March 1999



Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.


Author[s]: Gideon P. Stein, Raquel Romano and Lily Lee

Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame

April 1999



Passive monitoring of large sites typically requires coordination between multiple cameras, which in turn requires methods for automatically relating events between distributed cameras. This paper tackles the problem of self-calibration of multiple cameras which are very far apart, using feature correspondences to determine the camera geometry. The key problem is finding such correspondences. Since the camera geometry and photometric characteristics vary considerably between images, one cannot use brightness and/or proximity constraints. Instead we apply planar geometric constraints to moving objects in the scene in order to align the scene"s ground plane across multiple views. We do not assume synchronized cameras, and we show that enforcing geometric constraints enables us to align the tracking data in time. Once we have recovered the homography which aligns the planar structure in the scene, we can compute from the homography matrix the 3D position of the plane and the relative camera positions. This in turn enables us to recover a homography matrix which maps the images to an overhead view. We demonstrate this technique in two settings: a controlled lab setting where we test the effects of errors in internal camera calibration, and an uncontrolled, outdoor setting in which the full procedure is applied to external camera calibration and ground plane recovery. In spite of noise in the internal camera parameters and image data, the system successfully recovers both planar structure and relative camera positions in both settings.



Author[s]: Theodoros Evgeniou and Massimiliano Pontil

On the V(subscript gamma) Dimension for Regression in Reproducing Kernel Hilbert Spaces

May 1999



This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.


Author[s]: Hany Farid

Detecting Digital Forgeries Using Bispectral Analysis

December 1999



With the rapid increase in low-cost and sophisticated digital technology the need for techniques to authenticate digital material will become more urgent. In this paper we address the problem of authenticating digital signals assuming no explicit prior knowledge of the original. The basic approach that we take is to assume that in the frequency domain a "natural" signal has weak higher-order statistical correlations. We then show that "un-natural" correlations are introduced if this signal is passed through a non-linearity (which would almost surely occur in the creation of a forgery). Techniques from polyspectral analysis are then used to detect the presence of these correlations. We review the basics of polyspectral analysis, show how and why these tools can be used in detecting forgeries and show their effectiveness in analyzing human speech.



Author[s]: Tony Ezzat and Tomaso Poggio

Visual Speech Synthesis by Morphing Visemes

May 1999



We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.



Author[s]: Ryan Rifkin, Massimiliano Pontil and Alessandro Verri

A Note on Support Vector Machines Degeneracy

August 11, 1999



When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.


Author[s]: Liana M. Lorigo, Olivier Faugeras, W.E.L. Grimson, Renaud Keriven, Ron Kikinis, Carl-Fredrik Westin

Co-dimension 2 Geodesic Active Contours for MRA Segmentation

August 11, 1999



Automatic and semi-automatic magnetic resonance angiography (MRA)s segmentation techniques can potentially save radiologists larges amounts of time required for manual segmentation and cans facilitate further data analysis. The proposed MRAs segmentation method uses a mathematical modeling technique whichs is well-suited to the complicated curve-like structure of bloods vessels. We define the segmentation task as ans energy minimization over all 3D curves and use a level set methods to search for a solution. Ours approach is an extension of previous level set segmentations techniques to higher co-dimension.


Author[s]: Rajesh Kasturirangan

Multiple Scales in Small-World Networks

August 11, 1999



Small-world architectures may be implicated in a range of phenomena from networks of neurons in the cerebral cortex to social networks and propogation of viruses. Small- world networks are interpolations of regular and random networks that retain the advantages of both regular and random networks by being highly clustered like regular networks and having small average path length between nodes, like random networks. While most of the recent attention on small- world networks has focussed on the effect of introducing disorder/randomness into a regular network, we show that that the fundamental mechanism behind the small- world phenomenon is not disorder/ randomness, but the presence of connections of many different length scales. Consequently, in order to explain the small-world phenomenon, we introduce the concept of multiple scale networks and then state the multiple length scale hypothesis. We show that small-world behavior in randomly rewired networks is a consequence of features common to all multiple scale networks. To support the multiple length scale hypothesis, novel network architectures are introduced that need not be a result of random rewiring of a regular network. In each case it is shown that whenever the network exhibits small- world behavior, it also has connections of diverse length scales. We also show that the distribution of the length scales of the new connections is significantly more important than whether the new connections are long range, medium range or short range.



Author[s]: Anuj Mohan

Object Detection in Images by Components

August 11, 1999



In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.


Author[s]: Harold Abelson, Don Allen, Daniel Coore, Chris Hanson, George Homsy, Thomas F. Knight, Jr., Radhika Nagpal, Erik Rauch, Gerald Jay Sussman and Ron Weiss

Amorphous Computing

August 29, 1999



Amorphous computing is the development of organizational principles and programming languages for obtaining coherent behaviors from the cooperation of myriads of unreliable parts that are interconnected in unknown, irregular, and time-varying ways. The impetus for amorphous computing comes from developments in microfabrication and fundamental biology, each of which is the basis of a kernel technology that makes it possible to build or grow huge numbers of almost-identical information-processing units at almost no cost. This paper sets out a research agenda for realizing the potential of amorphous computing and surveys some initial progress, both in programming and in fabrication. We describe some approaches to programming amorphous systems, which are inspired by metaphors from biology and physics. We also present the basic ideas of cellular computing, an approach to constructing digital-logic circuits within living cells by representing logic levels by concentrations DNA-binding proteins.


Author[s]: Radhika Nagpal

Organizing a Global Coordinate System from Local Information on an Amorphous Computer

August 29, 1999



This paper demonstrates that it is possible to generate a reasonably accurate coordinate system on randomly distributed processors, using only local information and local communication. By coordinate systems we imply that each element assigns itself a logical coordinate that maps to its global physical location, starting with no apriori knowledge of position or orientation. The algorithm presented is inspired by biological systems that use chemical gradients to determine the position of cells. Extensive analysis and simulation results are presented. Two key results are: there is a critical minimum average neighborhood size of 15 for good accuracy and there is a fundamental limit on the resolution of any coordinate system determined strictly from local communication. We also demonstrate that using this algorithm, random distributions of processors produce significantly better accuracy than regular processor grids - such as those used by cellular automata. This has implications for discrete models of biology as well as for building smart sensor arrays.


Author[s]: Tommi Jaakkola, Marina Meila and Tony Jebara

Maximum Entropy Discrimination

December 1999



We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class- conditional models within this framework. Preliminary experimental results are indicative of the potential in these techniques.


Author[s]: Kinh Tieu and Paul Viola

Boosting Image Database Retrieval

September 10, 1999



We present an approach for image database retrieval using a very large number of highly- selective features and simple on-line learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual "causes" and that images which are visually similar share causes. We propose a mechanism for generating a large number of complex features which capture some aspects of this causal structure. Boosting is used to learn simple and efficient classifiers in this complex feature space. Finally we will describe a practical implementation of our retrieval system on a database of 3000 images.


Author[s]: Mark M. Millonas and Erik M. Rauch

Trans-membrane Signal Transduction and Biochemical Turing Pattern Formation

September 28, 1999



The Turing mechanism for the production of a broken spatial symmetry in an initially homogeneous system of reacting and diffusing substances has attracted much interest as a potential model for certain aspects of morphogenesis such as pre- patterning in the embryo, and has also served as a model for self-organization in more generic systems. The two features necessary for the formation of Turing patterns are short- range autocatalysis and long-range inhibition which usually only occur when the diffusion rate of the inhibitor is significantly greater than that of the activator. This observation has sometimes been used to cast doubt on applicability of the Turing mechanism to cellular patterning since many messenger molecules that diffuse between cells do so at more-or-less similar rates. Here we show that stationary, symmetry-breaking Turing patterns can form in physiologically realistic systems even when the extracellular diffusion coefficients are equal; the kinetic properties of the 'receiver' and 'transmitter' proteins responsible for signal transduction will be primary factors governing this process.



Author[s]: Vinay P. Kumar and Tomaso Poggio

Learning-Based Approach to Real Time Tracking and Analysis of Faces

September 23, 1999



This paper describes a trainable system capable of tracking faces and facialsfeatures like eyes and nostrils and estimating basic mouth features such as sdegrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.



Author[s]: Constantine P. Papageorgiou and Tomaso Poggio

A Trainable Object Detection System: Car Detection in Static Images

october 13, 1999



This paper describes a general, trainable architecture for object detection that has previously been applied to face and peoplesdetection with a new application to car detection in static images. Our technique is a learning based approach that uses a set of labeled training data from which an implicit model of an object class -- here, cars -- is learned. Instead of pixel representations that may be noisy and therefore not provide a compact representation for learning, our training images are transformed from pixel space to that of Haar wavelets that respond to local, oriented, multiscale intensity differences. These feature vectors are then used to train a support vector machine classifier. The detection of cars in images is an important step in applications such as traffic monitoring, driver assistance systems, and surveillance, among others. We show several examples of car detection on out-of- sample images and show an ROC curve that highlights the performance of our system.


Author[s]: J.P. Mellor

Automatically Recovering Geometry and Texture from Large Sets of Calibrated Images

October 22, 1999



Three-dimensional models which contain both geometry and texture have numerous applications such as urban planning, physical simulation, and virtual environments. A major focus of computer vision (and recently graphics) research is the automatic recovery of three-dimensional models from two- dimensional images. After many years of research this goal is yet to be achieved. Most practical modeling systems require substantial human input and unlike automatic systems are not scalable. This thesis presents a novel method for automatically recovering dense surface patches using large sets (1000's) of calibrated images taken from arbitrary positions within the scene. Physical instruments, such as Global Positioning System (GPS), inertial sensors, and inclinometers, are used to estimate the position and orientation of each image. Essentially, the problem is to find corresponding points in each of the images. Once a correspondence has been established, calculating its three-dimensional position is simply a matter of geometry. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem. The initial stage of the algorithm is completely local and scales linearly with the number of images. Subsequent stages are global in nature, exploit geometric constraints, and scale quadratically with the complexity of the underlying scene. We describe techniques for: 1) detecting and localizing surface patches; 2) refining camera calibration estimates and rejecting false positive surfels; and 3) grouping surface patches into surfaces and growing the surface along a two-dimensional manifold. We also discuss a method for producing high quality, textured three-dimensional models from these surfaces. Some of the most important characteristics of this approach are that it: 1) uses and refines noisy calibration estimates; 2) compensates for large variations in illumination; 3) tolerates significant soft occlusion (e.g. tree branches); and 4) associates, at a fundamental level, an estimated normal (i.e. no frontal-planar assumption) and texture with each surface patch.


Author[s]: J. Kenneth Salisbury, Jr. and Mandayam A. Srinivasan (editors)

Proceedings of the Fourth PHANTOM Users Group Workshop

November 4, 1999



This Report contains the proceedings of the Fourth Phantom Users Group Workshop contains 17 papers presented October 9-12, 1999 at MIT Endicott House in Dedham Massachusetts. The workshop included sessions on, Tools for Programmers, Dynamic Environments, Perception and Cognition, Haptic Connections, Collision Detection / Collision Response, Medical and Seismic Applications, and Haptics Going Mainstream. The proceedings include papers that cover a variety of subjects in computer haptics including rendering, contact determination, development libraries, and applications in medicine, path planning, data interaction and training.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

A Note on Object Class Representation and Categorical Perception

December 17, 1999



We present a novel scheme ("Categorical Basis Functions", CBF) for object class representation in the brain and contrast it to the "Chorus of Prototypes" scheme recently proposed by Edelman. The power and flexibility of CBF is demonstrated in two examples. CBF is then applied to investigate the phenomenon of Categorical Perception, in particular the finding by Bulthoff et al. (1998) of categorization of faces by gender without corresponding Categorical Perception. Here, CBF makes predictions that can be tested in a psychophysical experiment. Finally, experiments are suggested to further test CBF.



Author[s]: Theodoros Evgeniou and Massimiliano Pontil

A Note on the Generalization Performance of Kernel Classifiers with Margin

May 1, 2000



We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.



Author[s]: Constantine P. Papageorgiou

A Trainable System for Object Detection in Images and Video Sequences

May 2000



This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways -- 1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance.



Author[s]: Vinay Kumar and Tomaso Poggio

Learning-Based Approach to Estimation of Morphable Model Parameters

September 2000



We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

The Individual is Nothing, the Class Everything: Psychophysics and Modeling of Recognition in Obect Classes

May, 1, 2000



Most psychophysical studies of object recognition have focussed on the recognition and representation of individual objects subjects had previously explicitely been trained on. Correspondingly, modeling studies have often employed a 'grandmother'-type representation where the objects to be recognized were represented by individual units. However, objects in the natural world are commonly members of a class containing a number of visually similar objects, such as faces, for which physiology studies have provided support for a representation based on a sparse population code, which permits generalization from the learned exemplars to novel objects of that class. In this paper, we present results from psychophysical and modeling studies intended to investigate object recognition in natural ('continuous') object classes. In two experiments, subjects were trained to perform subordinate level discrimination in a continuous object class - images of computer-rendered cars - created using a 3D morphing system. By comparing the recognition performance of trained and untrained subjects we could estimate the effects of viewpoint-specific training and infer properties of the object class-specific representation learned as a result of training. We then compared the experimental findings to simulations, building on our recently presented HMAX model of object recognition in cortex, to investigate the computational properties of a population-based object class representation as outlined above. We find experimental evidence, supported by modeling results, that training builds a viewpoint- and class-specific representation that supplements a pre-existing repre-sentation with lower shape discriminability but possibly greater viewpoint invariance.



Author[s]: Bernd Heisele, Tomaso Poggio and Massimiliano Pontil

Face Detection in Still Gray Images

May 2000



We present a trainable system for detecting frontal and near-frontal views of faces in still gray images using Support Vector Machines (SVMs). We first consider the problem of detecting the whole face pattern by a single SVM classifer. In this context we compare different types of image features, present and evaluate a new method for reducing the number of features and discuss practical issues concerning the parameterization of SVMs and the selection of training data. The second part of the paper describes a component-based method for face detection consisting of a two-level hierarchy of SVM classifers. On the first level, component classifers independently detect components of a face, such as the eyes, the nose, and the mouth. On the second level, a single classifer checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face.



Author[s]: Chikahito Nakajima, Massimiliano Pontil, Bernd Heisele and Tomaso Poggio

People Recognition in Image Sequences by Supervised Learning

June 2000



We describe a system that learns from examples to recognize people in images taken indoors. Images of people are represented by color-based and shape-based features. Recognition is carried out through combinations of Support Vector Machine classifiers (SVMs). Different types of multiclass strategies based on SVMs are explored and compared to k-Nearest Neighbors classifiers (kNNs). The system works in real time and shows high performance rates for people recognition throughout one day.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

Computational Models of Object Recognition in Cortex: A Review

August 7, 2000



Understanding how biological visual systems perform object recognition is one of the ultimate goals in computational neuroscience. Among the biological models of recognition the main distinctions are between feedforward and feedback and between object-centered and view-centered. From a computational viewpoint the different recognition tasks - for instance categorization and identification - are very similar, representing different trade-offs between specificity and invariance. Thus the different tasks do not strictly require different classes of models. The focus of the review is on feedforward, view-based models that are supported by psychophysical and physiological data.



Author[s]: Thomas Serre, Bernd Heisele, Sayan Mukherjee and Tomaso Poggio

Feature Selection for Face Detection

September 2000



We present a new method to select features for a face detection system using Support Vector Machines (SVMs). In the first step we reduce the dimensionality of the input space by projecting the data into a subset of eigenvectors. The dimension of the subset is determined by a classification criterion based on minimizing a bound on the expected error probability of an SVM. In the second step we select features from the SVM feature space by removing those that have low contributions to the decision function of the SVM.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu