CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

CSAIL Digital Archive - Artificial Intelligence Laboratory Series
Publications - 2003

AI PublicationsLast update Sun Mar 19 05:05:02 2006


Author[s]: Philip Mjong-Hyon Shin Kim

Understanding Subsystems in Biology through Dimensionality Reduction, Graph Partitioning and Analytical Modeling

February 5, 2003



Biological systems exhibit rich and complex behavior through the orchestrated interplay of a large array of components. It is hypothesized that separable subsystems with some degree of functional autonomy exist; deciphering their independent behavior and functionality would greatly facilitate understanding the system as a whole. Discovering and analyzing such subsystems are hence pivotal problems in the quest to gain a quantitative understanding of complex biological systems. In this work, using approaches from machine learning, physics and graph theory, methods for the identification and analysis of such subsystems were developed. A novel methodology, based on a recent machine learning algorithm known as non-negative matrix factorization (NMF), was developed to discover such subsystems in a set of large-scale gene expression data. This set of subsystems was then used to predict functional relationships between genes, and this approach was shown to score significantly higher than conventional methods when benchmarking them against existing databases. Moreover, a mathematical treatment was developed to treat simple network subsystems based only on their topology (independent of particular parameter values). Application to a problem of experimental interest demonstrated the need for extentions to the conventional model to fully explain the experimental data. Finally, the notion of a subsystem was evaluated from a topological perspective. A number of different protein networks were examined to analyze their topological properties with respect to separability, seeking to find separable subsystems. These networks were shown to exhibit separability in a nonintuitive fashion, while the separable subsystems were of strong biological significance. It was demonstrated that the separability property found was not due to incomplete or biased data, but is likely to reflect biological structure.


Author[s]: Nathan Srebro and Tommi Jaakkola

Generalized Low-Rank Approximations

January 15, 2003



We study the frequent problem of approximating a target matrix with a matrix of lower rank. We provide a simple and efficient (EM) algorithm for solving {\em weighted} low rank approximation problems, which, unlike simple matrix factorization problems, do not admit a closed form solution in general. We analyze, in addition, the nature of locally optimal solutions that arise in this context, demonstrate the utility of accommodating the weights in reconstructing the underlying low rank representation, and extend the formulation to non-Gaussian noise models such as classification (collaborative filtering).


Author[s]: Timothy Chklovski

Using Analogy to Acquire Commonsense Knowledge from Human Contributors

February 12, 2003



The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical.


Author[s]: Harald Steck annd Tommi S. Jaakkola

(Semi-)Predictive Discretization During Model Selection

February 25, 2003



In this paper, we present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade- off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also independent of the metric used in the continuous space. Our experiments with gene expression data show that discretization plays a crucial role regarding the resulting network structure.


Author[s]: Leonid Peshkin

Reinforcement Learning by Policy Search

February 14, 2003



One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.



Author[s]: Gadi Geiger, Tony Ezzat and Tomaso Poggio

Perceptual Evaluation of Video-Realistic Speech

February 28, 2003



abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.


Author[s]: Aaron D. Adler

Segmentation and Alignment of Speech and Sketching in a Design Environment

February 2003



Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.



Author[s]: Izzat N. Jarudi and Pawan Sinha

Relative Contributions of Internal and External Features to Face Recognition

March 2003



The central challenge in face recognition lies in understanding the role different facial features play in our judgments of identity. Notable in this regard are the relative contributions of the internal (eyes, nose and mouth) and external (hair and jaw-line) features. Past studies that have investigated this issue have typically used high-resolution images or good-quality line drawings as facial stimuli. The results obtained are therefore most relevant for understanding the identification of faces at close range. However, given that real-world viewing conditions are rarely optimal, it is also important to know how image degradations, such as loss of resolution caused by large viewing distances, influence our ability to use internal and external features. Here, we report experiments designed to address this issue. Our data characterize how the relative contributions of internal and external features change as a function of image resolution. While we replicated results of previous studies that have shown internal features of familiar faces to be more useful for recognition than external features at high resolution, we found that the two feature sets reverse in importance as resolution decreases. These results suggest that the visual system uses a highly non-linear cue-fusion strategy in combining internal and external features along the dimension of image resolution and that the configural cues that relate the two feature sets play an important role in judgments of facial identity.



Author[s]: Sanmay Das

Intelligent Market-Making in Artificial Financial Markets

June 2003



This thesis describes and evaluates a market-making algorithm for setting prices in financial markets with asymmetric information, and analyzes the properties of artificial markets in which the algorithm is used. The core of our algorithm is a technique for maintaining an online probability density estimate of the underlying value of a stock. Previous theoretical work on market-making has led to price-setting equations for which solutions cannot be achieved in practice, whereas empirical work on algorithms for market-making has focused on sets of heuristics and rules that lack theoretical justification. The algorithm presented in this thesis is theoretically justified by results in finance, and at the same time flexible enough to be easily extended by incorporating modules for dealing with considerations like portfolio risk and competition from other market-makers. We analyze the performance of our algorithm experimentally in artificial markets with different parameter settings and find that many reasonable real-world properties emerge. For example, the spread increases in response to uncertainty about the true value of a stock, average spreads tend to be higher in more volatile markets, and market-makers with lower average spreads perform better in environments with multiple competitive market- makers. In addition, the time series data generated by simple markets populated with market-makers using our algorithm replicate properties of real-world financial time series, such as volatility clustering and the fat-tailed nature of return distributions, without the need to specify explicit models for opinion propagation and herd behavior in the trading crowd.


Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman and Mark A. Rubin

Context-Based Vision System for Place and Object Recognition

March 19, 2003



While navigating in an environment, a vision system has to be able to recognize where it is and what the main objects in the scene are. In this paper we present a context-based vision system for place and object recognition. The goal is to identify familiar locations (e.g., office 610, conference room 941, Main Street), to categorize new environments (office, corridor, street) and to use that information to provide contextual priors for object recognition (e.g., table, chair, car, computer). We present a low- dimensional global image representation that provides relevant information for place recognition and categorization, and how such contextual information introduces strong priors that simplify object recognition. We have trained the system to recognize over 60 locations (indoors and outdoors) and to suggest the presence and locations of more than 20 different object types. The algorithm has been integrated into a mobile system that provides real-time feedback to the user.


Author[s]: Christine Alvarado, Jaime Teevan, Mark S. Ackerman and David Karger

Surviving the Information Explosion: How People Find Their Electronic Information

April 15, 2003



We report on a study of how people look for information within email, files, and the Web. When locating a document or searching for a specific answer, people relied on their contextual knowledge of their information target to help them find it, often associating the target with a specific document. They appeared to prefer to use this contextual information as a guide in navigating locally in small steps to the desired document rather than directly jumping to their target. We found this behavior was especially true for people with unstructured information organization. We discuss the implications of our findings for the design of personal information management tools.


Author[s]: Louis-Philippe Morency

Stereo-Based Head Pose Tracking Using Iterative Closest Point and Normal Flow Constraint

May 2003



In this text, we present two stereo-based head tracking techniques along with a fast 3D model acquisition system. The first tracking technique is a robust implementation of stereo-based head tracking designed for interactive environments with uncontrolled lighting. We integrate fast face detection and drift reduction algorithms with a gradient-based stereo rigid motion tracking technique. Our system can automatically segment and track a user's head under large rotation and illumination variations. Precision and usability of this approach are compared with previous tracking methods for cursor control and target selection in both desktop and interactive room environments. The second tracking technique is designed to improve the robustness of head pose tracking for fast movements. Our iterative hybrid tracker combines constraints from the ICP (Iterative Closest Point) algorithm and normal flow constraint. This new technique is more precise for small movements and noisy depth than ICP alone, and more robust for large movements than the normal flow constraint alone. We present experiments which test the accuracy of our approach on sequences of real and synthetic stereo images. The 3D model acquisition system we present quickly aligns intensity and depth images, and reconstructs a textured 3D mesh. 3D views are registered with shape alignment based on our iterative hybrid tracker. We reconstruct the 3D model using a new Cubic Ray Projection merging algorithm which takes advantage of a novel data structure: the linked voxel space. We present experiments to test the accuracy of our approach on 3D face modelling using real-time stereo images.


Author[s]: Jacob Beal

Leveraging Learning and Language Via Communication Bootstrapping

March 17, 2003



In a Communication Bootstrapping system, peer components with different perceptual worlds invent symbols and syntax based on correlations between their percepts. I propose that Communication Bootstrapping can also be used to acquire functional definitions of words and causal reasoning knowledge. I illustrate this point with several examples, then sketch the architecture of a system in progress which attempts to execute this task.


Author[s]: Kristen Grauman

A Statistical Image-Based Shape Model for Visual Hull Reconstruction and 3D Structure Inference

May 22, 2003



We present a statistical image-based “shape + structure” model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.


Author[s]: Kristen Grauman, Gregory Shakhnarovich and Trevor Darrell

Inferring 3D Structure with a Statistical Image-Based Shape Model

April 17, 2003



We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.


Author[s]: Paul Fitzpatrick

From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot

June 2003



This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.


Author[s]: Gregory Shakhnarovich, Paul Viola and Trevor Darrell

Fast Pose Estimation with Parameter Sensitive Hashing

April 18, 2003



Example-based methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and high-dimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly becme prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends a recently developed method for locality-sensitive hashing, which finds approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions; we show how to find the set of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call Parameter-Sensitive Hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.



Author[s]: Jennifer Louie

A Biological Model of Object Recognition with Feature Learning

June 2003



Previous biological models of object recognition in cortex have been evaluated using idealized scenes and have hard-coded features, such as the HMAX model by Riesenhuber and Poggio [10]. Because HMAX uses the same set of features for all object classes, it does not perform well in the task of detecting a target object in clutter. This thesis presents a new model that integrates learning of object-specific features with the HMAX. The new model performs better than the standard HMAX and comparably to a computer vision system on face detection. Results from experimenting with unsupervised learning of features and the use of a biologically-plausible classifier are presented.


Author[s]: Chris Mario Christoudias, Louis-Philippe Morency and Trevor Darrell

Light Field Morphable Models

April 18, 2003



Statistical shape and texture appearance models are powerful image representations, but previously had been restricted to 2D or simple 3D shapes. In this paper we present a novel 3D morphable model based on image-based rendering techniques, which can represent complex lighting conditions, structures, and surfaces. We describe how to construct a manifold of the multi-view appearance of an object class using light fields and show how to match a 2D image of an object to a point on this manifold. In turn we use the reconstructed light field to render novel views of the object. Our technique overcomes the limitations of polygon based appearance models and uses light fields that are acquired in real-time.



Author[s]: Ezra Rosen

Face Representation in Cortex: Studies Using a Simple and Not So Special Model

June 5, 2003



The face inversion effect has been widely documented as an effect of the uniqueness of face processing. Using a computational model, we show that the face inversion effect is a byproduct of expertise with respect to the face object class. In simulations using HMAX, a hierarchical, shape based model, we show that the magnitude of the inversion effect is a function of the specificity of the representation. Using many, sharply tuned units, an ``expert'' has a large inversion effect. On the other hand, if fewer, broadly tuned units are used, the expertise is lost, and this ``novice'' has a small inversion effect. As the size of the inversion effect is a product of the representation, not the object class, given the right training we can create experts and novices in any object class. Using the same representations as with faces, we create experts and novices for cars. We also measure the feasibility of a view-based model for recognition of rotated objects using HMAX. Using faces, we show that transfer of learning to novel views is possible. Given only one training view, the view-based model can recognize a face at a new orientation via interpolation from the views to which it had been tuned. Although the model can generalize well to upright faces, inverted faces yield poor performance because the features change differently under rotation.


Author[s]: Jacob Beal

Persistent Nodes for Reliable Memory in Geographically Local Networks

April 15, 2003



A Persistent Node is a redundant distributed mechanism for storing a key/value pair reliably in a geographically local network. In this paper, I develop a method of establishing Persistent Nodes in an amorphous matrix. I address issues of construction, usage, atomicity guarantees and reliability in the face of stopping failures. Applications include routing, congestion control, and data storage in gigascale networks.


Author[s]: Claire Monteleoni

Online Learning of Non-stationary Sequences

June 12, 2003



We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them as simple Bayesian algorithms operating on models analogous to Hidden Markov Models (HMMs). We derive a new performance bound for such algorithms which is considerably simpler than existing bounds. The bound provides the basis for learning the rate at which the identity of the optimal expert switches over time. We find an analytic expression for the a priori resolution at which we need to learn the rate parameter. We extend our scalar switching-rate result to models of the switching-rate that are governed by a matrix of parameters, i.e. arbitrary homogeneous HMMs. We apply and examine our algorithm in the context of the problem of energy management in wireless networks. We analyze the new results in the framework of Information Theory.


Author[s]: Jacob Beal

A Robust Amorphous Hierarchy from Persistent Nodes

May 1, 2003



For a very large network deployed in space with only nearby nodes able to talk to each other, we want to do tasks like robust routing and data storage. One way to organize the network is via a hierarchy, but hierarchies often have a few critical nodes whose death can disrupt organization over long distances. I address this with a system of distributed aggregates called Persistent Nodes, such that spatially local failures disrupt the hierarchy in an area proportional to the diameter of the failure. I describe and analyze this system, which has been implemented in simulation.


Author[s]: Andreas F. Wehowsky

Safe Distributed Coordination of Heterogeneous Robots through Dynamic Simple Temporal Networks

May 30, 2003



Research on autonomous intelligent systems has focused on how robots can robustly carry out missions in uncertain and harsh environments with very little or no human intervention. Robotic execution languages such as RAPs, ESL, and TDL improve robustness by managing functionally redundant procedures for achieving goals. The model-based programming approach extends this by guaranteeing correctness of execution through pre-planning of non-deterministic timed threads of activities. Executing model-based programs effectively on distributed autonomous platforms requires distributing this pre-planning process. This thesis presents a distributed planner for modelbased programs whose planning and execution is distributed among agents with widely varying levels of processor power and memory resources. We make two key contributions. First, we reformulate a model-based program, which describes cooperative activities, into a hierarchical dynamic simple temporal network. This enables efficient distributed coordination of robots and supports deployment on heterogeneous robots. Second, we introduce a distributed temporal planner, called DTP, which solves hierarchical dynamic simple temporal networks with the assistance of the distributed Bellman- Ford shortest path algorithm. The implementation of DTP has been demonstrated successfully on a wide range of randomly generated examples and on a pursuer-evader challenge problem in simulation.


Author[s]: Matthew J. Marjanovic

Teaching an Old Robot New Tricks: Learning Novel Tasks via Interaction with People and Things

June 20, 2003



As AI has begun to reach out beyond its symbolic, objectivist roots into the embodied, experientialist realm, many projects are exploring different aspects of creating machines which interact with and respond to the world as humans do. Techniques for visual processing, object recognition, emotional response, gesture production and recognition, etc., are necessary components of a complete humanoid robot. However, most projects invariably concentrate on developing a few of these individual components, neglecting the issue of how all of these pieces would eventually fit together. The focus of the work in this dissertation is on creating a framework into which such specific competencies can be embedded, in a way that they can interact with each other and build layers of new functionality. To be of any practical value, such a framework must satisfy the real-world constraints of functioning in real-time with noisy sensors and actuators. The humanoid robot Cog provides an unapologetically adequate platform from which to take on such a challenge. This work makes three contributions to embodied AI. First, it offers a general-purpose architecture for developing behavior-based systems distributed over networks of PC's. Second, it provides a motor-control system that simulates several biological features which impact the development of motor behavior. Third, it develops a framework for a system which enables a robot to learn new behaviors via interacting with itself and the outside world. A few basic functional modules are built into this framework, enough to demonstrate the robot learning some very simple behaviors taught by a human trainer. A primary motivation for this project is the notion that it is practically impossible to build an "intelligent" machine unless it is designed partly to build itself. This work is a proof-of-concept of such an approach to integrating multiple perceptual and motor systems into a complete learning agent.


Author[s]: Lawrence Shih and David Karger

Learning Classes Correlated to a Hierarchy

May 2003



Trees are a common way of organizing large amounts of information by placing items with similar characteristics near one another in the tree. We introduce a classification problem where a given tree structure gives us information on the best way to label nearby elements. We suggest there are many practical problems that fall under this domain. We propose a way to map the classification problem onto a standard Bayesian inference problem. We also give a fast, specialized inference algorithm that incrementally updates relevant probabilities. We apply this algorithm to web-classification problems and show that our algorithm empirically works well.


Author[s]: Lily Lee

Gait Analysis for Classification

June 26, 2003



This thesis describes a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple localized image features such as moments extracted from orthogonal view video silhouettes of human walking motion. A suite of time-integration methods, spanning a range of coarseness of time aggregation and modeling of feature distributions, are applied to these image features to create a suite of gait sequence representations. Despite their simplicity, the resulting feature vectors contain enough information to perform well on human identification and gender classification tasks. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times and under varying lighting environments. Each of the integration methods are investigated for their advantages and disadvantages. An improved gait representation is built based on our experiences with the initial set of gait representations. In addition, we show gender classification results using our gait appearance features, the effect of our heuristic feature selection method, and the significance of individual features.


Author[s]: Martin C. Martin

The Essential Dynamics Algorithm: Essential Results

May 2003



This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic one is presented which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results in a bicycle riding domain are presented. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. All code used in the experiments is available on the project’s web site.


Author[s]: Samson Timoner

Compact Representations for Fast Nonrigid Registration of Medical Images

July 4, 2003



We develop efficient techniques for the non-rigid registration of medical images by using representations that adapt to the anatomy found in such images. Images of anatomical structures typically have uniform intensity interiors and smooth boundaries. We create methods to represent such regions compactly using tetrahedra. Unlike voxel-based representations, tetrahedra can accurately describe the expected smooth surfaces of medical objects. Furthermore, the interior of such objects can be represented using a small number of tetrahedra. Rather than describing a medical object using tens of thousands of voxels, our representations generally contain only a few thousand elements. Tetrahedra facilitate the creation of efficient non-rigid registration algorithms based on finite element methods (FEM). We create a fast, FEM-based method to non-rigidly register segmented anatomical structures from two subjects. Using our compact tetrahedral representations, this method generally requires less than one minute of processing time on a desktop PC. We also create a novel method for the non-rigid registration of gray scale images. To facilitate a fast method, we create a tetrahedral representation of a displacement field that automatically adapts to both the anatomy in an image and to the displacement field. The resulting algorithm has a computational cost that is dominated by the number of nodes in the mesh (about 10,000), rather than the number of voxels in an image (nearly 10,000,000). For many non-rigid registration problems, we can find a transformation from one image to another in five minutes. This speed is important as it allows use of the algorithm during surgery. We apply our algorithms to find correlations between the shape of anatomical structures and the presence of schizophrenia. We show that a study based on our representations outperforms studies based on other representations. We also use the results of our non-rigid registration algorithm as the basis of a segmentation algorithm. That algorithm also outperforms other methods in our tests, producing smoother segmentations and more accurately reproducing manual segmentations.


Author[s]: Kimberle Koile, Konrad Tollmar, David Demirdjian, Howard Shrobe and Trevor Darrell

Activity Zones for Context-Aware Computing

June 10, 2003



Location is a primary cue in many context-aware computing systems, and is often represented as a global coordinate, room number, or Euclidean distance various landmarks. A user?s concept of location, however, is often defined in terms of regions in which common activities occur. We show how to partition a space into such regions based on patterns of observed user location and motion. These regions, which we call activity zones, represent regions of similar user activity, and can be used to trigger application actions, retrieve information based on previous context, and present information to users. We suggest that context- aware applications can benefit from a location representation learned from observing users. We describe an implementation of our system and present two example applications whose behavior is controlled by users? entry, exit, and presence in the zones.


Author[s]: Pedro F. Felzenszwalb

Representation and Detection of Shapes in Images

August 8, 2003



We present a set of techniques that can be used to represent and detect shapes in images. Our methods revolve around a particular shape representation based on the description of objects using triangulated polygons. This representation is similar to the medial axis transform and has important properties from a computational perspective. The first problem we consider is the detection of non-rigid objects in images using deformable models. We present an efficient algorithm to solve this problem in a wide range of situations, and show examples in both natural and medical images. We also consider the problem of learning an accurate non-rigid shape model for a class of objects from examples. We show how to learn good models while constraining them to the form required by the detection algorithm. Finally, we consider the problem of low-level image segmentation and grouping. We describe a stochastic grammar that generates arbitrary triangulated polygons while capturing Gestalt principles of shape regularity. This grammar is used as a prior model over random shapes in a low level algorithm that detects objects in images.


Author[s]: Krzysztof Gajos and Howard Shrobe

Delegation, Arbitration and High-Level Service Discovery as Key Elements of a Software Infrastructure for Pervasive Computing

June 2003



The dream of pervasive computing is slowly becoming a reality. A number of projects around the world are constantly contributing ideas and solutions that are bound to change the way we interact with our environments and with one another. An essential component of the future is a software infrastructure that is capable of supporting interactions on scales ranging from a single physical space to intercontinental collaborations. Such infrastructure must help applications adapt to very diverse environments and must protect people’s privacy and respect their personal preferences. In this paper we indicate a number of limitations present in the software infrastructures proposed so far (including our previous work). We then describe the framework for building an infrastructure that satisfies the abovementioned criteria. This framework hinges on the concepts of delegation, arbitration and high-level service discovery. Components of our own implementation of such an infrastructure are presented.


Author[s]: Jacob Beal

Near-Optimal Distributed Failure Circumscription

August 11, 2003



Small failures should only disrupt a small part of a network. One way to do this is by marking the surrounding area as untrustworthy --- circumscribing the failure. This can be done with a distributed algorithm using hierarchical clustering and neighbor relations, and the resulting circumscription is near-optimal for convex failures.


Author[s]: Austin Che

Fluorescence Assay for Polymerase Arrival Rates

August 31, 2003



To engineer complex synthetic biological systems will require modular design, assembly, and characterization strategies. The RNA polymerase arrival rate (PAR) is defined to be the rate that RNA polymerases arrive at a specified location on the DNA. Designing and characterizing biological modules in terms of RNA polymerase arrival rates provides for many advantages in the construction and modeling of biological systems. PARMESAN is an in vitro method for measuring polymerase arrival rates using pyrrolo-dC, a fluorescent DNA base that can substitute for cytosine. Pyrrolo-dC shows a detectable fluorescence difference when in single-stranded versus double-stranded DNA. During transcription, RNA polymerase separates the two strands of DNA, leading to a change in the fluorescence of pyrrolo-dC. By incorporating pyrrolo-dC at specific locations in the DNA, fluorescence changes can be taken as a direct measurement of the polymerase arrival rate.



Author[s]: Benjamin J. Balas, Pawan Sinha

Dissociated Dipoles: Image representation via non-local comparisons

August 13, 2003



A fundamental question in visual neuroscience is how to represent image structure. The most common representational schemes rely on differential operators that compare adjacent image regions. While well-suited to encoding local relationships, such operators have significant drawbacks. Specifically, each filter’s span is confounded with the size of its sub-fields, making it difficult to compare small regions across large distances. We find that such long-distance comparisons are more tolerant to common image transformations than purely local ones, suggesting they may provide a useful vocabulary for image encoding. . We introduce the “Dissociated Dipole,” or “Sticks” operator, for encoding non-local image relationships. This operator de-couples filter span from sub-field size, enabling parametric movement between edge and region-based representation modes. We report on the perceptual plausibility of the operator, and the computational advantages of non-local encoding. Our results suggest that non-local encoding may be an effective scheme for representing image structure.


Author[s]: Sayan Mukherjee, Polina Golland and Dmitry Panchenko

Permutation Tests for Classification

August 28, 2003



We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.



Author[s]: Hiroaki Shimizu and Tomaso Poggio

Direction Estimation of Pedestrian from Images

August 27, 2003



The capability of estimating the walking direction of people would be useful in many applications such as those involving autonomous cars and robots. We introduce an approach for estimating the walking direction of people from images, based on learning the correct classification of a still image by using SVMs. We find that the performance of the system can be improved by classifying each image of a walking sequence and combining the outputs of the classifier. Experiments were performed to evaluate our system and estimate the trade-off between number of images in walking sequences and performance.



Author[s]: Minjoon Kouh and Maximilian Riesenhuber

Investigating shape representation in area V4 with HMAX: Orientation and Grating selectivities

September 8, 2003



The question of how shape is represented is of central interest to understanding visual processing in cortex. While tuning properties of the cells in early part of the ventral visual stream, thought to be responsible for object recognition in the primate, are comparatively well understood, several different theories have been proposed regarding tuning in higher visual areas, such as V4. We used the model of object recognition in cortex presented by Riesenhuber and Poggio (1999), where more complex shape tuning in higher layers is the result of combining afferent inputs tuned to simpler features, and compared the tuning properties of model units in intermediate layers to those of V4 neurons from the literature. In particular, we investigated the issue of shape representation in visual area V1 and V4 using oriented bars and various types of gratings (polar, hyperbolic, and Cartesian), as used in several physiology experiments. Our computational model was able to reproduce several physiological findings, such as the broadening distribution of the orientation bandwidths and the emergence of a bias toward non-Cartesian stimuli. Interestingly, the simulation results suggest that some V4 neurons receive input from afferents with spatially separated receptive fields, leading to experimentally testable predictions. However, the simulations also show that the stimulus set of Cartesian and non-Cartesian gratings is not sufficiently complex to probe shape tuning in higher areas, necessitating the use of more complex stimulus sets.


Author[s]: Michael G. Ross and Leslie Pack Kaelbling

Learning object segmentation from video data

September 8, 2003



This memo describes the initial results of a project to create a self-supervised algorithm for learning object segmentation from video data. Developmental psychology and computational experience have demonstrated that the motion segmentation of objects is a simpler, more primitive process than the detection of object boundaries by static image cues. Therefore, motion information provides a plausible supervision signal for learning the static boundary detection task and for evaluating performance on a test set. A video camera and previously developed background subtraction algorithms can automatically produce a large database of motion-segmented images for minimal cost. The purpose of this work is to use the information in such a database to learn how to detect the object boundaries in novel images using static information, such as color, texture, and shape. This work was funded in part by the Office of Naval Research contract #N00014-00-1-0298, in part by the Singapore-MIT Alliance agreement of 11/6/98, and in part by a National Science Foundation Graduate Student Fellowship.


Author[s]: Jacob Eisenstein

Evolving Robocode Tank Fighters

October 28, 2003



In this paper, I describe the application of genetic programming to evolve a controller for a robotic tank in a simulated environment. The purpose is to explore how genetic techniques can best be applied to produce controllers based on subsumption and behavior oriented languages such as REX. As part of my implementation, I developed TableRex, a modification of REX that can be expressed on a fixed-length genome. Using a fixed subsumption architecture of TableRex modules, I evolved robots that beat some of the most competitive hand-coded adversaries.



Author[s]: Christian Morgenstern, Bernd Heisele

Component based recognition of objects in an office environment

November 28, 2003



We present a component-based approach for recognizing objects under large pose changes. From a set of training images of a given object we extract a large number of components which are clustered based on the similarity of their image features and their locations within the object image. The cluster centers build an initial set of component templates from which we select a subset for the final recognizer. In experiments we evaluate different sizes and types of components and three standard techniques for component selection. The component classifiers are finally compared to global classifiers on a database of four objects.


Author[s]: Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling

Mobilized ad-hoc networks: A reinforcement learning approach

December 4, 2003



Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies.


Author[s]: Kristen Grauman and Trevor Darrell

Fast Contour Matching Using Approximate Earth Mover's Distance

December 5, 2003



Weighted graph matching is a good way to align a pair of shapes represented by a set of descriptive local features; the set of correspondences produced by the minimum cost of matching features from one shape to the features of the other often reveals how similar the two shapes are. However, due to the complexity of computing the exact minimum cost matching, previous algorithms could only run efficiently when using a limited number of features per shape, and could not scale to perform retrievals from large databases. We present a contour matching algorithm that quickly computes the minimum weight matching between sets of descriptive local features using a recently introduced low-distortion embedding of the Earth Mover's Distance (EMD) into a normed space. Given a novel embedded contour, the nearest neighbors in a database of embedded contours are retrieved in sublinear time via approximate nearest neighbors search. We demonstrate our shape matching method on databases of 10,000 images of human figures and 60,000 images of handwritten digits.


Author[s]: Jacob Beal and Seth Gilbert

RamboNodes for the Metropolitan Ad Hoc Network

December 17, 2003



We present an algorithm to store data robustly in a large, geographically distributed network by means of localized regions of data storage that move in response to changing conditions. For example, data might migrate away from failures or toward regions of high demand. The PersistentNode algorithm provides this service robustly, but with limited safety guarantees. We use the RAMBO framework to transform PersistentNode into RamboNode, an algorithm that guarantees atomic consistency in exchange for increased cost and decreased liveness. In addition, a half-life analysis of RamboNode shows that it is robust against continuous low-rate failures. Finally, we provide experimental simulations for the algorithm on 2000 nodes, demonstrating how it services requests and examining how it responds to failures.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu