CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

CSAIL Digital Archive - Artificial Intelligence Laboratory Series
Publications - 2001

AI PublicationsLast update Sun Mar 19 05:05:02 2006


Author[s]: T. Darrell, D. Demirdjian, N. Checka and P. Felzenswalb

Plan-view Trajectory Estimation with Dense Stereo Background Models

February 2001



In a known environment, objects may be tracked in multiple views using a set of back-ground models. Stereo-based models can be illumination-invariant, but often have undefined values which inevitably lead to foreground classification errors. We derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions. Foreground points are detected quickly in new images using pruned disparity search. We adopt a 'late-segmentation' strategy, using an integrated plan-view density representation. Foreground points are segmented into object regions only when a trajectory is finally estimated, using a dynamic programming-based method. Object entry and exit are optimally determined and are not restricted to special spatial zones.


Author[s]: Kimberle Koile

The Architect's Collaborator: Toward Intelligent Tools for Conceptual Design

January 2001



In early stages of architectural design, as in other design domains, the language used is often very abstract. In architectural design, for example, architects and their clients use experiential terms such as "private" or "open" to describe spaces. If we are to build programs that can help designers during this early-stage design, we must give those programs the capability to deal with concepts on the level of such abstractions. The work reported in this thesis sought to do that, focusing on two key questions: How are abstract terms such as "private" and "open" translated into physical form? How might one build a tool to assist designers with this process? The Architect's Collaborator (TAC) was built to explore these issues. It is a design assistant that supports iterative design refinement, and that represents and reasons about how experiential qualities are manifested in physical form. Given a starting design and a set of design goals, TAC explores the space of possible designs in search of solutions that satisfy the goals. It employs a strategy we've called dependency-directed redesign: it evaluates a design with respect to a set of goals, then uses an explanation of the evaluation to guide proposal and refinement of repair suggestions; it then carries out the repair suggestions to create new designs. A series of experiments was run to study TAC's behavior. Issues of control structure, goal set size, goal order, and modification operator capabilities were explored. In addition, TAC's use as a design assistant was studied in an experiment using a house in the process of being redesigned. TAC's use as an analysis tool was studied in an experiment using Frank Lloyd Wright's Prairie houses.



Author[s]: Christian R. Shelton

Policy Improvement for POMDPs Using Normalized Importance Sampling

March 20, 2001



We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.


Author[s]: Pedro F. Felzenszwalb

Object Recognition with Pictorial Structures

May 2001



This thesis presents a statistical framework for object recognition. The framework is motivated by the pictorial structure models introduced by Fischler and Elschlager nearly 30 years ago. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. The problem of detecting an object in an image and the problem of learning an object model using training examples are naturally formulated under a statistical approach. We present efficient algorithms to solve these problems in our framework. We demonstrate our techniques by training models to represent faces and human bodies. The models are then used to locate the corresponding objects in novel images.


Author[s]: Nicolas Meuleau, Leonid Peshkin and Kee-Eung Kim

Exploration in Gradient-Based Reinforcement Learning

April 3, 2001



Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements.



Author[s]: Christian Robert Shelton

Importance Sampling for Reinforcement Learning with Multiple Objectives

August 2001



This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms. We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm. Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making.



Author[s]: Mariano Alvira and Ryan Rifkin

An Empirical Comparison of SNoW and SVMs for Face Detection

January 2001



Impressive claims have been made for the performance of the SNoW algorithm on face detection tasks by Yang et. al. [7]. In particular, by looking at both their results and those of Heisele et. al. [3], one could infer that the SNoW system performed substantially better than an SVM-based system, even when the SVM used a polynomial kernel and the SNoW system used a particularly simplistic 'primitive' linear representation. We evaluated the two approaches in a controlled experiment, looking directly at performance on a simple, fixed-sized test set, isolating out 'infrastructure' issues related to detecting faces at various scales in large images. We found that SNoW performed about as well as linear SVMs, and substantially worse than polynomial SVMs.


Author[s]: Jason D. M. Rennie

Improving Multi-class Text Classification with Naive Bayes

September 2001



There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.



Author[s]: Nicholas Tung Chan and Christian Shelton

An Electronic Market-Maker

April 17, 2001



This paper presents an adaptive learning model for market-making under the reinforcement learning framework. Reinforcement learning is a learning technique in which agents aim to maximize the long-term accumulated rewards. No knowledge of the market environment, such as the order arrival or price process, is assumed. Instead, the agent learns from real-time market experience and develops explicit market-making strategies, achieving multiple objectives including the maximizing of profits and minimization of the bid-ask spread. The simulation results show initial success in bringing learning techniques to building market-making algorithms.


Author[s]: Jessica Banks

Design and Control of an Anthropomorphic Robotic Finger with Multi-point Tactile Sensation

May 2001



The goal of this research is to develop the prototype of a tactile sensing platform for anthropomorphic manipulation research. We investigate this problem through the fabrication and simple control of a planar 2-DOF robotic finger inspired by anatomic consistency, self-containment, and adaptability. The robot is equipped with a tactile sensor array based on optical transducer technology whereby localized changes in light intensity within an illuminated foam substrate correspond to the distribution and magnitude of forces applied to the sensor surface plane. The integration of tactile perception is a key component in realizing robotic systems which organically interact with the world. Such natural behavior is characterized by compliant performance that can initiate internal, and respond to external, force application in a dynamic environment. However, most of the current manipulators that support some form of haptic feedback either solely derive proprioceptive sensation or only limit tactile sensors to the mechanical fingertips. These constraints are due to the technological challenges involved in high resolution, multi-point tactile perception. In this work, however, we take the opposite approach, emphasizing the role of full-finger tactile feedback in the refinement of manual capabilities. To this end, we propose and implement a control framework for sensorimotor coordination analogous to infant-level grasping and fixturing reflexes. This thesis details the mechanisms used to achieve these sensory, actuation, and control objectives, along with the design philosophies and biological influences behind them. The results of behavioral experiments with a simple tactilely-modulated control scheme are also described. The hope is to integrate the modular finger into an %engineered analog of the human hand with a complete haptic system.



Author[s]: Javid Sadr and Pawan Sinha

Exploring Object Perception with Random Image Structure Evolution

March 2001



We have developed a technique called RISE (Random Image Structure Evolution), by which one may systematically sample continuous paths in a high-dimensional image space. A basic RISE sequence depicts the evolution of an object's image from a random field, along with the reverse sequence which depicts the transformation of this image back into randomness. The processing steps are designed to ensure that important low-level image attributes such as the frequency spectrum and luminance are held constant throughout a RISE sequence. Experiments based on the RISE paradigm can be used to address some key open issues in object perception. These include determining the neural substrates underlying object perception, the role of prior knowledge and expectation in object perception, and the developmental changes in object perception skills from infancy to adulthood.


Author[s]: Aaron Mark Ucko

Predicate Dispatching in the Common Lisp Object System

May 2001



I have added support for predicate dispatching, a powerful generalization of other dispatching mechanisms, to the Common Lisp Object System (CLOS). To demonstrate its utility, I used predicate dispatching to enhance Weyl, a computer algebra system which doubles as a CLOS library. My result is Dispatching-Enhanced Weyl (DEW), a computer algebra system that I have demonstrated to be well suited for both users and programmers.


Author[s]: Konstantine Arkoudas

Certified Computation

April 30, 2001



This paper introduces the notion of certified computation. A certified computation does not only produce a result r, but also a correctness certificate, which is a formal proof that r is correct. This can greatly enhance the credibility of the result: if we trust the axioms and inference rules that are used in the certificate,then we can be assured that r is correct. In effect,we obtain a trust reduction: we no longer have to trust the entire computation; we only have to trust the certificate. Typically, the reasoning used in the certificate is much simpler and easier to trust than the entire computation. Certified computation has two main applications: as a software engineering discipline, it can be used to increase the reliability of our code; and as a framework for cooperative computation, it can be used whenever a code consumer executes an algorithm obtained from an untrusted agent and needs to be convinced that the generated results are correct. We propose DPLs (Denotational Proof Languages)as a uniform platform for certified computation. DPLs enforce a sharp separation between logic and control and over versatile mechanicms for constructing certificates. We use Athena as a concrete DPL to illustrate our ideas, and we present two examples of certified computation, giving full working code in both cases.


Author[s]: Won Hong

Modeling, Estimation, and Control of Robot-Soil Interactions

September 2001



This thesis presents the development of hardware, theory, and experimental methods to enable a robotic manipulator arm to interact with soils and estimate soil properties from interaction forces. Unlike the majority of robotic systems interacting with soil, our objective is parameter estimation, not excavation. To this end, we design our manipulator with a flat plate for easy modeling of interactions. By using a flat plate, we take advantage of the wealth of research on the similar problem of earth pressure on retaining walls. There are a number of existing earth pressure models. These models typically provide estimates of force which are in uncertain relation to the true force. A recent technique, known as numerical limit analysis, provides upper and lower bounds on the true force. Predictions from the numerical limit analysis technique are shown to be in good agreement with other accepted models. Experimental methods for plate insertion, soil-tool interface friction estimation, and control of applied forces on the soil are presented. In addition, a novel graphical technique for inverting the soil models is developed, which is an improvement over standard nonlinear optimization. This graphical technique utilizes the uncertainties associated with each set of force measurements to obtain all possible parameters which could have produced the measured forces. The system is tested on three cohesionless soils, two in a loose state and one in a loose and dense state. The results are compared with friction angles obtained from direct shear tests. The results highlight a number of key points. Common assumptions are made in soil modeling. Most notably, the Mohr-Coulomb failure law and perfectly plastic behavior. In the direct shear tests, a marked dependence of friction angle on the normal stress at low stresses is found. This has ramifications for any study of friction done at low stresses. In addition, gradual failures are often observed for vertical tools and tools inclined away from the direction of motion. After accounting for the change in friction angle at low stresses, the results show good agreement with the direct shear values.


Author[s]: Radhika Nagpal

Programmable Self-Assembly: Constructing Global Shape using Biologically-inspire

June 2001



In this thesis I present a language for instructing a sheet of identically-programmed, flexible, autonomous agents (``cells'') to assemble themselves into a predetermined global shape, using local interactions. The global shape is described as a folding construction on a continuous sheet, using a set of axioms from paper-folding (origami). I provide a means of automatically deriving the cell program, executed by all cells, from the global shape description. With this language, a wide variety of global shapes and patterns can be synthesized, using only local interactions between identically-programmed cells. Examples include flat layered shapes, all plane Euclidean constructions, and a variety of tessellation patterns. In contrast to approaches based on cellular automata or evolution, the cell program is directly derived from the global shape description and is composed from a small number of biologically-inspired primitives: gradients, neighborhood query, polarity inversion, cell-to-cell contact and flexible folding. The cell programs are robust, without relying on regular cell placement, global coordinates, or synchronous operation and can tolerate a small amount of random cell death. I show that an average cell neighborhood of 15 is sufficient to reliably self-assemble complex shapes and geometric patterns on randomly distributed cells. The language provides many insights into the relationship between local and global descriptions of behavior, such as the advantage of constructive languages, mechanisms for achieving global robustness, and mechanisms for achieving scale- independent shapes from a single cell program. The language suggests a mechanism by which many related shapes can be created by the same cell program, in the manner of D'Arcy Thompson's famous coordinate transformations. The thesis illuminates how complex morphology and pattern can emerge from local interactions, and how one can engineer robust self-assembly.


Author[s]: A. Rahimi, L.-P. Morency and T. Darrell

Reducing Drift in Parametric Motion Tracking

May 7, 2001



We develop a class of differential motion trackers that automatically stabilize when in finite domains. Most differ-ential trackers compute motion only relative to one previous frame, accumulating errors indefinitely. We estimate pose changes between a set of past frames, and develop a probabilistic framework for integrating those estimates. We use an approximation to the posterior distribution of pose changes as an uncertainty model for parametric motion in order to help arbitrate the use of multiple base frames. We demonstrate this framework on a simple 2D translational tracker and a 3D, 6-degree of freedom tracker.


Author[s]: Tevfik Metin Sezgin

Feature Point Detection and Curve Approximation for Early Processing of Freehand Sketches

May 2001



Freehand sketching is both a natural and crucial part of design, yet is unsupported by current design automation software. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer to produce a design environment that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in the sketch into the intended geometric objects using feature point detection and approximation. We demonstrate how multiple sources of information can be combined for feature detection in strokes and apply this technique using two approaches to signal processing, one using simple average based thresholding and a second using scale space.


Author[s]: D. Demirdjian and T. Darrell

Motion Estimation from Disparity Images

May 7, 2001



A new method for 3D rigid motion estimation from stereo is proposed in this paper. The appealing feature of this method is that it directly uses the disparity images obtained from stereo matching. We assume that the stereo rig has parallel cameras and show, in that case, the geometric and topological properties of the disparity images. Then we introduce a rigid transformation (called d-motion) that maps two disparity images of a rigidly moving object. We show how it is related to the Euclidean rigid motion and a motion estimation algorithm is derived. We show with experiments that our approach is simple and more accurate than standard approaches.



Author[s]: Purdy Ho

Rotation Invariant Real-time Face Detection and Recognition System

May 31, 2001



In this report, a face recognition system that is capable of detecting and recognizing frontal and rotated faces was developed. Two face recognition methods focusing on the aspect of pose invariance are presented and evaluated - the whole face approach and the component-based approach. The main challenge of this project is to develop a system that is able to identify faces under different viewing angles in realtime. The development of such a system will enhance the capability and robustness of current face recognition technology. The whole-face approach recognizes faces by classifying a single feature vector consisting of the gray values of the whole face image. The component-based approach first locates the facial components and extracts them. These components are normalized and combined into a single feature vector for classification. The Support Vector Machine (SVM) is used as the classifier for both approaches. Extensive tests with respect to the robustness against pose changes are performed on a database that includes faces rotated up to about 40 degrees in depth. The component-based approach clearly outperforms the whole-face approach on all tests. Although this approach isproven to be more reliable, it is still too slow for real-time applications. That is the reason why a real-time face recognition system using the whole-face approach is implemented to recognize people in color video sequences.



Author[s]: T. Poggio, S. Mukherjee, R. Rifkin, A. Rakhlin, and A. Verri


July 2001



In this note we characterize the role of b ,which is the constant in the standard form of the solution provided by the Support Vector Machine technique f (x )=  i =1 • i K (x ,x i )+b .



Author[s]: Mariano Alvira, Jim Paris and Ryan Rifkin

The Audiomomma Music Recommendation System

July 2001



We design and implement a system that recommends musicians to listeners. The basic idea is to keep track of what artists a user listens to, to find other users with similar tastes, and to recommend other artists that these similar listeners enjoy. The system utilizes a client-server architecture, a web-based interface, and an SQL database to store and process information. We describe Audiomomma-0.3, a proof-of-concept implementation of the above ideas.



Author[s]: Nicholas T. Chan, Ely Dahan, Andrew W. Lo and Tomaso Poggio

Experimental Markets for Product Concepts

July 2001



Market prices are well known to efficiently collect and aggregate diverse information regarding the value of commodities and assets. The role of markets has been particularly suitable to pricing financial securities. This article provides an alternative application of the pricing mechanism to marketing research - using pseudo-securities markets to measure preferences over new product concepts. Surveys, focus groups, concept tests and conjoint studies are methods traditionally used to measure individual and aggregate preferences. Unfortunately, these methods can be biased, costly and time-consuming to conduct. The present research is motivated by the desire to efficiently measure preferences and more accurately predict new product success, based on the efficiency and incentive-compatibility of security trading markets. The article describes a novel market research method, pro-vides insight into why the method should work, and compares the results of several trading experiments against other methodologies such as concept testing and conjoint analysis.



Author[s]: Richard Russell and Pawan Sinha

Perceptually-based Comparison of Image Similarity Metrics

July 2001



The image comparison operation – assessing how well one image matches another – forms a critical component of many image analysis systems and models of human visual processing. Two norms used commonly for this purpose are L1 and L2, which are specific instances of the Minkowski metric. However, there is often not a principled reason for selecting one norm over the other. One way to address this problem is by examining whether one metric better captures the perceptual notion of image similarity than the other. With this goal, we examined perceptual preferences for images retrieved on the basis of the L1 versus the L2 norm. These images were either small fragments without recognizable content, or larger patterns with recognizable content created via vector quantization. In both conditions the subjects showed a consistent preference for images matched using the L1 metric. These results suggest that, in the domain of natural images of the kind we have used, the L1 metric may better capture human notions of image similarity.



Author[s]: Antonio Torralba, Pawan Sinha

Recognizing Indoor Scenes

July 25, 2001



We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-resolution spatial and spectral information. The holistic nature of the representation dispenses with the need to rely on specific objects or local landmarks and also renders it robust against variations in object configurations. We demonstrate the scheme on the problem of recognizing scenes in video sequences captured while walking through an office environment. We develop a method for distinguishing between 'diagnostic' and 'generic' views and also evaluate changes in system performances as a function of the amount of training data available and the complexity of the representation.


Author[s]: Jacob Beal

An Algorithm for Bootstrapping Communications

August 13, 2001



I present an algorithm which allows two agents to generate a simple language based only on observations of a shared environment. Vocabulary and roles for the language are learned in linear time. Communication is robust and degrades gradually as complexity increases. Dissimilar modes of experience will lead to a shared kernel vocabulary.



Author[s]: Pawan Sinha and Antonio Torralba

Role of Low-level Mechanisms in Brightness Perception

August 2001



Brightness judgments are a key part of the primate brain’s visual analysis of the environment. There is general consensus that the perceived brightness of an image region is based not only on its actual luminance, but also on the photometric structure of its neighborhood. However, it is unclear precisely how a region’s context influences its perceived brightness. Recent research has suggested that brightness estimation may be based on a sophisticated analysis of scene layout in terms of transparency, illumination and shadows. This work has called into question the role of low-level mechanisms, such as lateral inhibition, as explanations for brightness phenomena. Here we describe experiments with displays for which low-level and high-level analyses make qualitatively different predictions, and with which we can quantitatively assess the trade-offs between low-level and high-level factors. We find that brightness percepts in these displays are governed by low-level stimulus properties, even when these percepts are inconsistent with higher-level interpretations of scene layout. These results point to the important role of low-level mechanisms in determining brightness percepts.



Author[s]: Gene Yeo, Tomaso Poggio

Multiclass Classification of SRBCTs

August 25, 2001



A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.


Author[s]: Lily Lee

Gait Dynamics for Recognition and Classification

September 2001



This paper describes a representation of the dynamics of human walking action for the purpose of person identification and classification by gait appearance. Our gait representation is based on simple features such as moments extracted from video silhouettes of human walking motion. We claim that our gait dynamics representation is rich enough for the task of recognition and classification. The use of our feature representation is demonstrated in the task of person recognition from video sequences of orthogonal views of people walking. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times, and under varying lighting environments. In addition, preliminary results are shown on gender classification using our gait dynamics features.



Author[s]: Antonio Torralba and Pawan Sinha

Contextual Priming for Object Detection

September 2001



There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.


Author[s]: Erik G. Miller, Kinh Tieu and Chris P. Stauffer

Learning Object-Independent Modes of Variation with Feature Flow Fields

September 2001



We present a unifying framework in which "object-independent" modes of variation are learned from continuous-time data such as video sequences. These modes of variation can be used as "generators" to produce a manifold of images of a new object from a single example of that object. We develop the framework in the context of a well-known example: analyzing the modes of spatial deformations of a scene under camera movement. Our method learns a close approximation to the standard affine deformations that are expected from the geometry of the situation, and does so in a completely unsupervised (i.e. ignorant of the geometry of the situation) fashion. We stress that it is learning a "parameterization", not just the parameter values, of the data. We then demonstrate how we have used the same framework to derive a novel data-driven model of joint color change in images due to common lighting variations. The model is superior to previous models of color change in describing non-linear color changes due to lighting.



Author[s]: Angela J. Yu, Martin A. Giese and Tomaso A. Poggio

Biologically Plausible Neural Circuits for Realization of Maximum Operations

September 2001



Object recognition in the visual cortex is based on a hierarchical architecture, in which specialized brain regions along the ventral pathway extract object features of increasing levels of complexity, accompanied by greater invariance in stimulus size, position, and orientation. Recent theoretical studies postulate a non-linear pooling function, such as the maximum (MAX) operation could be fundamental in achieving such invariance. In this paper, we are concerned with neurally plausible mechanisms that may be involved in realizing the MAX operation. Four canonical circuits are proposed, each based on neural mechanisms that have been previously discussed in the context of cortical processing. Through simulations and mathematical analysis, we examine the relative performance and robustness of these mechanisms. We derive experimentally verifiable predictions for each circuit and discuss their respective physiological considerations.


Author[s]: Ron O. Dror, Edward H. Adelson and Alan S. Willsky

Surface Reflectance Estimation and Natural Illumination Statistics

September 2001



Humans recognize optical reflectance properties of surfaces such as metal, plastic, or paper from a single image without knowledge of illumination. We develop a machine vision system to perform similar recognition tasks automatically. Reflectance estimation under unknown, arbitrary illumination proves highly underconstrained due to the variety of potential illumination distributions and surface reflectance properties. We have found that the spatial structure of real-world illumination possesses some of the statistical regularities observed in the natural image statistics literature. A human or computer vision system may be able to exploit this prior information to determine the most likely surface reflectance given an observed image. We develop an algorithm for reflectance classification under unknown real-world illumination, which learns relationships between surface reflectance and certain features (statistics) computed from a single observed image. We also develop an automatic feature selection method.


Author[s]: Leonid Taycher and Trevor Darrell

Range Segmentation Using Visibility Constraints

September 2001



Visibility constraints can aid the segmentation of foreground objects observed with multiple range images. In our approach, points are defined as foreground if they can be determined to occlude some {em empty space} in the scene. We present an efficient algorithm to estimate foreground points in each range view using explicit epipolar search. In cases where the background pattern is stationary, we show how visibility constraints from other views can generate virtual background values at points with no valid depth in the primary view. We demonstrate the performance of both algorithms for detecting people in indoor office environments.


Author[s]: Konstantine Arkoudas

Type-alpha DPLs

October 5, 2001



This paper introduces Denotational Proof Languages (DPLs). DPLs are languages for presenting, discovering, and checking formal proofs. In particular, in this paper we discus type-alpha DPLs---a simple class of DPLs for which termination is guaranteed and proof checking can be performed in time linear in the size of the proof. Type-alpha DPLs allow for lucid proof presentation and for efficient proof checking, but not for proof search. Type-omega DPLs allow for search as well as simple presentation and checking, but termination is no longer guaranteed and proof checking may diverge. We do not study type-omega DPLs here. We start by listing some common characteristics of DPLs. We then illustrate with a particularly simple example: a toy type-alpha DPL called PAR, for deducing parities. We present the abstract syntax of PAR, followed by two different kinds of formal semantics: evaluation and denotational. We then relate the two semantics and show how proof checking becomes tantamount to evaluation. We proceed to develop the proof theory of PAR, formulating and studying certain key notions such as observational equivalence that pervade all DPLs. We then present NDL, a type-alpha DPL for classical zero-order natural deduction. Our presentation of NDL mirrors that of PAR, showing how every basic concept that was introduced in PAR resurfaces in NDL. We present sample proofs of several well-known tautologies of propositional logic that demonstrate our thesis that DPL proofs are readable, writable, and concise. Next we contrast DPLs to typed logics based on the Curry-Howard isomorphism, and discuss the distinction between pure and augmented DPLs. Finally we consider the issue of implementing DPLs, presenting an implementation of PAR in SML and one in Athena, and end with some concluding remarks.



Author[s]: Jason D. M. Rennie and Ryan Rifkin

Improving Multiclass Text Classification with the Support Vector Machine

October 16, 2001



We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.


Author[s]: Konstantine Arkoudas

Type-omega DPLs

October 16, 2001



Type-omega DPLs (Denotational Proof Languages) are languages for proof presentation and search that offer strong soundness guarantees. LCF-type systems such as HOL offer similar guarantees, but their soundness relies heavily on static type systems. By contrast, DPLs ensure soundness dynamically, through their evaluation semantics; no type system is necessary. This is possible owing to a novel two-tier syntax that separates deductions from computations, and to the abstraction of assumption bases, which is factored into the semantics of the language and allows for sound evaluation. Every type-omega DPL properly contains a type-alpha DPL, which can be used to present proofs in a lucid and detailed form, exclusively in terms of primitive inference rules. Derived inference rules are expressed as user-defined methods, which are "proof recipes" that take arguments and dynamically perform appropriate deductions. Methods arise naturally via parametric abstraction over type-alpha proofs. In that light, the evaluation of a method call can be viewed as a computation that carries out a type-alpha deduction. The type-alpha proof "unwound" by such a method call is called the "certificate" of the call. Certificates can be checked by exceptionally simple type-alpha interpreters, and thus they are useful whenever we wish to minimize our trusted base. Methods are statically closed over lexical environments, but dynamically scoped over assumption bases. They can take other methods as arguments, they can iterate, and they can branch conditionally. These capabilities, in tandem with the bifurcated syntax of type-omega DPLs and their dynamic assumption-base semantics, allow the user to define methods in a style that is disciplined enough to ensure soundness yet fluid enough to permit succinct and perspicuous expression of arbitrarily sophisticated derived inference rules. We demonstrate every major feature of type-omega DPLs by defining and studying NDL-omega, a higher-order, lexically scoped, call-by-value type-omega DPL for classical zero-order natural deduction---a simple choice that allows us to focus on type-omega syntax and semantics rather than on the subtleties of the underlying logic. We start by illustrating how type-alpha DPLs naturally lead to type-omega DPLs by way of abstraction; present the formal syntax and semantics of NDL-omega; prove several results about it, including soundness; give numerous examples of methods; point out connections to the lambda-phi calculus, a very general framework for type-omega DPLs; introduce a notion of computational and deductive cost; define several instrumented interpreters for computing such costs and for generating certificates; explore the use of type-omega DPLs as general programming languages; show that DPLs do not have to be type-less by formulating a static Hindley-Milner polymorphic type system for NDL-omega; discuss some idiosyncrasies of type-omega DPLs such as the potential divergence of proof checking; and compare type-omega DPLs to other approaches to proof presentation and discovery. Finally, a complete implementation of NDL-omega in SML-NJ is given for users who want to run the examples and experiment with the language.



Author[s]: Antonio Torralba and Pawan Sinha

Detecting Faces in Impoverished Images

November 5, 2001



The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.



Author[s]: Yuri Ostrovsky, Patrick Cavanagh and Pawan Sinha

Perceiving Illumination Inconsistencies in Scenes

November 5, 2001



The human visual system is adept at detecting and encoding statistical regularities in its spatio-temporal environment. Here we report an unexpected failure of this ability in the context of perceiving inconsistencies in illumination distributions across a scene. Contrary to predictions from previous studies [Enns and Rensink, 1990; Sun and Perona, 1996a, 1996b, 1997], we find that the visual system displays a remarkable lack of sensitivity to illumination inconsistencies, both in experimental stimuli and in images of real scenes. Our results allow us to draw inferences regarding how the visual system encodes illumination distributions across scenes. Specifically, they suggest that the visual system does not verify the global consistency of locally derived estimates of illumination direction.


Author[s]: Adrian Corduneanu and Tommi Jaakkola

Stable Mixing of Complete and Incomplete Information

November 8, 2001



An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems.


Author[s]: Konstantine Arkoudas

Simplifying transformations for type-alpha certificates

November 13, 2001



This paper presents an algorithm for simplifying NDL deductions. An array of simplifying transformations are rigorously defined. They are shown to be terminating, and to respect the formal semantis of the language. We also show that the transformations never increase the size or complexity of a deduction---in the worst case, they produce deductions of the same size and complexity as the original. We present several examples of proofs containing various types of "detours", and explain how our procedure eliminates them, resulting in smaller and cleaner deductions. All of the given transformations are fully implemented in SML-NJ. The complete code listing is presented, along with explanatory comments. Finally, although the transformations given here are defined for NDL, we point out that they can be applied to any type-alpha DPL that satisfies a few simple conditions.


Author[s]: Roland W. Fleming, Ron O. Dror, Edward H. Adelson

How do Humans Determine Reflectance Properties under Unknown Illumination?

October 21, 2001



Under normal viewing conditions, humans find it easy to distinguish between objects made out of different materials such as plastic, metal, or paper. Untextured materials such as these have different surface reflectance properties, including lightness and gloss. With single isolated images and unknown illumination conditions, the task of estimating surface reflectance is highly underconstrained, because many combinations of reflection and illumination are consistent with a given image. In order to work out how humans estimate surface reflectance properties, we asked subjects to match the appearance of isolated spheres taken out of their original contexts. We found that subjects were able to perform the task accurately and reliably without contextual information to specify the illumination. The spheres were rendered under a variety of artificial illuminations, such as a single point light source, and a number of photographically-captured real-world illuminations from both indoor and outdoor scenes. Subjects performed more accurately for stimuli viewed under real-world patterns of illumination than under artificial illuminations, suggesting that subjects use stored assumptions about the regularities of real-world illuminations to solve the ill-posed problem.


Author[s]: Ron O. Dror, Edward H. Adelson, and Alan S. Willsky

Recognition of Surface Reflectance Properties from a Single Image under Unknown Real-World Illumination

October 21, 2001



This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.



Author[s]: Maximilian Riesenhuber

Generalization over contrast and mirror reversal, but not figure-ground reversal, in an "edge-based

December 10, 2001



Baylis & Driver (Nature Neuroscience, 2001) have recently presented data on the response of neurons in macaque inferotemporal cortex (IT) to various stimulus transformations. They report that neurons can generalize over contrast and mirror reversal, but not over figure-ground reversal. This finding is taken to demonstrate that ``the selectivity of IT neurons is not determined simply by the distinctive contours in a display, contrary to simple edge-based models of shape recognition'', citing our recently presented model of object recognition in cortex (Riesenhuber & Poggio, Nature Neuroscience, 1999). In this memo, I show that the main effects of the experiment can be obtained by performing the appropriate simulations in our simple feedforward model. This suggests for IT cell tuning that the possible contributions of explicit edge assignment processes postulated in (Baylis & Driver, 2001) might be smaller than expected.



Author[s]: Andrew Yip and Pawan Sinha

Role of color in face recognition

December 13, 2001



One of the key challenges in face perception lies in determining the contribution of different cues to face identification. In this study, we focus on the role of color cues. Although color appears to be a salient attribute of faces, past research has suggested that it confers little recognition advantage for identifying people. Here we report experimental results suggesting that color cues do play a role in face recognition and their contribution becomes evident when shape cues are degraded. Under such conditions, recognition performance with color images is significantly better than that with grayscale images. Our experimental results also indicate that the contribution of color may lie not so much in providing diagnostic cues to identity as in aiding low-level image-analysis processes such as segmentation.



Author[s]: Antonio Torralba and Aude Oliva

Global Depth Perception from Familiar Scene Structure

December 2001



In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual "size" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on specific objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu