CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

CSAIL Digital Archive - Artificial Intelligence Laboratory Series
Publications - 2002

AI PublicationsLast update Sun Mar 19 05:05:02 2006


Author[s]: Lilla Zollei

2D-3D Rigid-Body Registration of X-Ray Fluoroscopy and CT Images

August 2001



The registration of pre-operative volumetric datasets to intra- operative two-dimensional images provides an improved way of verifying patient position and medical instrument loca- tion. In applications from orthopedics to neurosurgery, it has a great value in maintaining up-to-date information about changes due to intervention. We propose a mutual information- based registration algorithm to establish the proper align- ment. For optimization purposes, we compare the perfor- mance of the non-gradient Powell method and two slightly di erent versions of a stochastic gradient ascent strategy: one using a sparsely sampled histogramming approach and the other Parzen windowing to carry out probability density approximation. Our main contribution lies in adopting the stochastic ap- proximation scheme successfully applied in 3D-3D registra- tion problems to the 2D-3D scenario, which obviates the need for the generation of full DRRs at each iteration of pose op- timization. This facilitates a considerable savings in compu- tation expense. We also introduce a new probability density estimator for image intensities via sparse histogramming, de- rive gradient estimates for the density measures required by the maximization procedure and introduce the framework for a multiresolution strategy to the problem. Registration results are presented on uoroscopy and CT datasets of a plastic pelvis and a real skull, and on a high-resolution CT- derived simulated dataset of a real skull, a plastic skull, a plastic pelvis and a plastic lumbar spine segment.


Author[s]: Trevor Darrell, Neal Checka, Alice Oh and Louis-Philippe Morency

Exploring Vision-Based Interfaces: How to Use Your Head in Dual Pointing Tasks

January 2002



The utility of vision-based face tracking for dual pointing tasks is evaluated. We first describe a 3-D face tracking technique based on real-time parametric motion-stereo, which is non-invasive, robust, and self-initialized. The tracker provides a real-time estimate of a ?frontal face ray? whose intersection with the display surface plane is used as a second stream of input for scrolling or pointing, in paral-lel with hand input. We evaluated the performance of com-bined head/hand input on a box selection and coloring task: users selected boxes with one pointer and colors with a second pointer, or performed both tasks with a single pointer. We found that performance with head and one hand was intermediate between single hand performance and dual hand performance. Our results are consistent with previously reported dual hand conflict in symmetric pointing tasks, and suggest that a head-based input stream should be used for asymmetric control.


Author[s]: William T. Freeman and Hao Zhang

Shape-Time Photography

January 10, 2002



We introduce a new method to describe, in a single image, changes in shape over time. We acquire both range and image information with a stationary stereo camera. From the pictures taken, we display a composite image consisting of the image data from the surface closest to the camera at every pixel. This reveals the 3-d relationships over time by easy-to-interpret occlusion relationships in the composite image. We call the composite a shape-time photograph. Small errors in depth measurements cause artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front surface, taking into account the depth measurements, their uncertainties, and layer continuity assumptions.


Author[s]: Jacob Beal

Generating Communications Systems Through Shared Context

January 2002



In a distributed model of intelligence, peer components need to communicate with one another. I present a system which enables two agents connected by a thick twisted bundle of wires to bootstrap a simple communication system from observations of a shared environment. The agents learn a large vocabulary of symbols, as well as inflections on those symbols which allow thematic role-frames to be transmitted. Language acquisition time is rapid and linear in the number of symbols and inflections. The final communication system is robust and performance degrades gradually in the face of problems.



Author[s]: Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee and Alex Rakhlin

Bagging Regularizes

March 2002



Intuitively, we expect that averaging --- or bagging --- different regressors with low correlation should smooth their behavior and be somewhat similar to regularization. In this note we make this intuition precise. Using an almost classical definition of stability, we prove that a certain form of averaging provides generalization bounds with a rate of convergence of the same order as Tikhonov regularization --- similar to fashionable RKHS- based learning algorithms.


Author[s]: Joanna J. Bryson

Intelligence by Design: Principles of Modularity and Coordination for Engineerin

September 2001



All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This dissertation describes an approach, Behavior-Oriented Design (BOD) for engineering complex agents. A complex agent is one that must arbitrate between potentially conflicting goals or behaviors. Behavior-oriented design builds on work in behavior-based and hybrid architectures for agents, and the object oriented approach to software engineering. The primary contributions of this dissertation are: 1.The BOD architecture: a modular architecture with each module providing specialized representations to facilitate learning. This includes one pre-specified module and representation for action selection or behavior arbitration. The specialized representation underlying BOD action selection is Parallel-rooted, Ordered, Slip-stack Hierarchical (POSH) reactive plans. 2.The BOD development process: an iterative process that alternately scales the agent's capabilities then optimizes the agent for simplicity, exploiting tradeoffs between the component representations. This ongoing process for controlling complexity not only provides bias for the behaving agent, but also facilitates its maintenance and extendibility. The secondary contributions of this dissertation include two implementations of POSH action selection, a procedure for identifying useful idioms in agent architectures and using them to distribute knowledge across agent paradigms, several examples of applying BOD idioms to established architectures, an analysis and comparison of the attributes and design trends of a large number of agent architectures, a comparison of biological (particularly mammalian) intelligence to artificial agent architectures, a novel model of primate transitive inference, and many other examples of BOD agents and BOD development.


Author[s]: Teodoro Arvizo III

A Virtual Machine for a Type-omega Denotational Proof Language

June 2002



In this thesis, I designed and implemented a virtual machine (VM) for a monomorphic variant of Athena, a type-omega denotational proof language (DPL). This machine attempts to maintain the minimum state required to evaluate Athena phrases. This thesis also includes the design and implementation of a compiler for monomorphic Athena that compiles to the VM. Finally, it includes details on my implementation of a read-eval-print loop that glues together the VM core and the compiler to provide a full, user-accessible interface to monomorphic Athena. The Athena VM provides the same basis for DPLs that the SECD machine does for pure, functional programming and the Warren Abstract Machine does for Prolog.



Author[s]: Ulf Knoblich and Maximilan Riesenhuber

Stimulus Simplification and Object Representation: A Modeling Study

March 15, 2002



Tsunoda et al. (2001) recently studied the nature of object representation in monkey inferotemporal cortex using a combination of optical imaging and extracellular recordings. In particular, they examined IT neuron responses to complex natural objects and "simplified" versions thereof. In that study, in 42% of the cases, optical imaging revealed a decrease in the number of activation patches in IT as stimuli were "simplified". However, in 58% of the cases, "simplification" of the stimuli actually led to the appearance of additional activation patches in IT. Based on these results, the authors propose a scheme in which an object is represented by combinations of active and inactive columns coding for individual features. We examine the patterns of activation caused by the same stimuli as used by Tsunoda et al. in our model of object recognition in cortex (Riesenhuber 99). We find that object-tuned units can show a pattern of appearance and disappearance of features identical to the experiment. Thus, the data of Tsunoda et al. appear to be in quantitative agreement with a simple object-based representation in which an object's identity is coded by its similarities to reference objects. Moreover, the agreement of simulations and experiment suggests that the simplification procedure used by Tsunoda (2001) is not necessarily an accurate method to determine neuronal tuning.


Author[s]: Jeremy Hanford Brown

Sparsely Faceted Arrays: A Mechanism Supporting Parallel Allocation, Communication, and Garbage Collection

June 2002



Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low- level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.


Author[s]: Gregory T. Sullivan

Advanced Programming Language Features for Executable Design Patterns "Better Patterns Through Reflection

March 22, 2002



The Design Patterns book [GOF95] presents 24 time-tested patterns that consistently appear in well-designed software systems. Each pattern is presented with a description of the design problem the pattern addresses, as well as sample implementation code and design considerations. This paper explores how the patterns from the "Gang of Four'', or "GOF'' book, as it is often called, appear when similar problems are addressed using a dynamic, higher-order, object-oriented programming language. Some of the patterns disappear -- that is, they are supported directly by language features, some patterns are simpler or have a different focus, and some are essentially unchanged.


Author[s]: Sarah Finney, Natalia H. Gardiol, Leslie Pack Kaelbling and Tim Oates

Learning with Deictic Representation

April 10, 2002



Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.


Author[s]: Andrew "bunnie" Huang

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

June 2002



The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.



Author[s]: Ulf Knoblich, David J. Freedman and Maximilian Riesenhuber

Categorization in IT and PFC: Model and Experiments

April 18, 2002



In a recent experiment, Freedman et al. recorded from inferotemporal (IT) and prefrontal cortices (PFC) of monkeys performing a "cat/dog" categorization task (Freedman 2001 and Freedman, Riesenhuber, Poggio, Miller 2001). In this paper we analyze the tuning properties of view-tuned units in our HMAX model of object recognition in cortex (Riesenhuber 1999) using the same paradigm and stimuli as in the experiment. We then compare the simulation results to the monkey inferotemporal neuron population data. We find that view-tuned model IT units that were trained without any explicit category information can show category-related tuning as observed in the experiment. This suggests that the tuning properties of experimental IT neurons might primarily be shaped by bottom-up stimulus-space statistics, with little influence of top-down task-specific information. The population of experimental PFC neurons, on the other hand, shows tuning properties that cannot be explained just by stimulus tuning. These analyses are compatible with a model of object recognition in cortex (Riesenhuber 2000) in which a population of shape-tuned neurons provides a general basis for neurons tuned to different recognition tasks.


Author[s]: Carl Steinbach

A Reinforcement-Learning Approach to Power Management

May 2002



We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.


Author[s]: Andrew "bunnie" Huang

Keeping Secrets in Hardware: the Microsoft Xbox(TM) Case Study

May 26, 2002



This paper discusses the hardware foundations of the cryptosystem employed by the Xbox(TM) video game console from Microsoft. A secret boot block overlay is buried within a system ASIC. This secret boot block decrypts and verifies portions of an external FLASH-type ROM. The presence of the secret boot block is camouflaged by a decoy boot block in the external ROM. The code contained within the secret boot block is transferred to the CPU in the clear over a set of high-speed busses where it can be extracted using simple custom hardware. The paper concludes with recommendations for improving the Xbox security system. One lesson of this study is that the use of a high-performance bus alone is not a sufficient security measure, given the advent of inexpensive, fast rapid prototyping services and high-performance FPGAs.



Author[s]: Vinay P. Kumar

Towards Man-Machine Interfaces: Combining Top-down Constraints with Bottom-up Learning in Facial Analysis

September 2002



This thesis proposes a methodology for the design of man-machine interfaces by combining top-down and bottom-up processes in vision. From a computational perspective, we propose that the scientific-cognitive question of combining top- down and bottom-up knowledge is similar to the engineering question of labeling a training set in a supervised learning problem. We investigate these questions in the realm of facial analysis. We propose the use of a linear morphable model (LMM) for representing top-down structure and use it to model various facial variations such as mouth shapes and expression, the pose of faces and visual speech (visemes). We apply a supervised learning method based on support vector machine (SVM) regression for estimating the parameters of LMMs directly from pixel-based representations of faces. We combine these methods for designing new, more self- contained systems for recognizing facial expressions, estimating facial pose and for recognizing visemes.



Author[s]: Adlar J. Kim and Christian R. Shelton

Modeling Stock Order Flows and Learning Market-Making from Data

June 2002



Stock markets employ specialized traders, market-makers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns profitable market-making strategies when run on this model. The sequence of buys and sells for a particular stock, the order flow, we model as an Input-Output Hidden Markov Model fit to historical data. When combined with the dynamics of the order book, this creates a highly non-linear and difficult dynamic system. Our reinforcement learning algorithm, based on likelihood ratios, is run on this partially-observable environment. We demonstrate learning results for two separate real stocks.


Author[s]: Ron O. Dror

Surface Reflectance Recognition and Real-World Illumination Statistics

October 2002



Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illuminination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image. Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance.


Author[s]: Justin Werfel

Implementing Universal Computation in an Evolutionary System

July 2002



Evolutionary algorithms are a common tool in engineering and in the study of natural evolution. Here we take their use in a new direction by showing how they can be made to implement a universal computer. We consider populations of individuals with genes whose values are the variables of interest. By allowing them to interact with one another in a specified environment with limited resources, we demonstrate the ability to construct any arbitrary logic circuit. We explore models based on the limits of small and large populations, and show examples of such a system in action, implementing a simple logic circuit.


Author[s]: John M. Van Eepoel

Achieving Real-Time Mode Estimation through Offline Compilation

October 22, 2002



As exploration of our solar system and outerspace move into the future, spacecraft are being developed to venture on increasingly challenging missions with bold objectives. The spacecraft tasked with completing these missions are becoming progressively more complex. This increases the potential for mission failure due to hardware malfunctions and unexpected spacecraft behavior. A solution to this problem lies in the development of an advanced fault management system. Fault management enables spacecraft to respond to failures and take repair actions so that it may continue its mission. The two main approaches developed for spacecraft fault management have been rule-based and model-based systems. Rules map sensor information to system behaviors, thus achieving fast response times, and making the actions of the fault management system explicit. These rules are developed by having a human reason through the interactions between spacecraft components. This process is limited by the number of interactions a human can reason about correctly. In the model-based approach, the human provides component models, and the fault management system reasons automatically about system wide interactions and complex fault combinations. This approach improves correctness, and makes explicit the underlying system models, whereas these are implicit in the rule- based approach. We propose a fault detection engine, Compiled Mode Estimation (CME) that unifies the strengths of the rule-based and model- based approaches. CME uses a compiled model to determine spacecraft behavior more accurately. Reasoning related to fault detection is compiled in an off-line process into a set of concurrent, localized diagnostic rules. These are then combined on-line along with sensor information to reconstruct the diagnosis of the system. These rules enable a human to inspect the diagnostic consequences of CME. Additionally, CME is capable of reasoning through component interactions automatically and still provide fast and correct responses. The implementation of this engine has been tested against the NEAR spacecraft advanced rule-based system, resulting in detection of failures beyond that of the rules. This evolution in fault detection will enable future missions to explore the furthest reaches of the solar system without the burden of human intervention to repair failed components.



Author[s]: Robert Schneider and Maximilian Riesenhuber

A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition

August 2002



The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view- tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.


Author[s]: J.P. Grossman

Design and Evaluation of the Hamal Parallel Computer

December 5, 2002



Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.



Author[s]: Martin Alexander Giese and Tomaso Poggio

Biologically Plausible Neural Model for the Recognition of Biological Motion and Actions

August 2002



The visual recognition of complex movements and actions is crucial for communication and survival in many species. Remarkable sensitivity and robustness of biological motion perception have been demonstrated in psychophysical experiments. In recent years, neurons and cortical areas involved in action recognition have been identified in neurophysiological and imaging studies. However, the detailed neural mechanisms that underlie the recognition of such complex movement patterns remain largely unknown. This paper reviews the experimental results and summarizes them in terms of a biologically plausible neural model. The model is based on the key assumption that action recognition is based on learned prototypical patterns and exploits information from the ventral and the dorsal pathway. The model makes specific predictions that motivate new experiments.



Author[s]: M.A. Giese and X. Xie

Exact Solution of the Nonlinear Dynamics of Recurrent Neural Mechanisms for Direction Selectivity

August 2002



Different theoretical models have tried to investigate the feasibility of recurrent neural mechanisms for achieving direction selectivity in the visual cortex. The mathematical analysis of such models has been restricted so far to the case of purely linear networks. We present an exact analytical solution of the nonlinear dynamics of a class of direction selective recurrent neural models with threshold nonlinearity. Our mathematical analysis shows that such networks have form-stable stimulus-locked traveling pulse solutions that are appropriate for modeling the responses of direction selective cortical neurons. Our analysis shows also that the stability of such solutions can break down giving raise to a different class of solutions ("lurching activity waves") that are characterized by a specific spatio-temporal periodicity. These solutions cannot arise in models for direction selectivity with purely linear spatio-temporal filtering.


Author[s]: Harald Steck and Tommi S. Jaakkola

On the Dirichlet Prior and Bayesian Regularization

September 2002



A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "trade-off" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.


Author[s]: Marshall F. Tappen, William T. Freeman and Edward H. Adelson

Recovering Intrinsic Images from a Single Image

September 2002



We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface's reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.


Author[s]: William T. Freeman and Antonio Torralba

Shape Recipes: Scene Representations that Refer to the Image

September 2002



The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (e.g., albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from bandpassed image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.


Author[s]: Jack Wisdom

Swimming in Space-Time

November 2002



Cyclic changes in the shape of a quasi-rigid body on a curved manifold can lead to net translation and/or rotation of the body in the manifold. Presuming space-time is a curved manifold as portrayed by general relativity, translation in space can be accomplished simply by cyclic changes in the shape of a body, without any thrust or external forces.


Author[s]: Gerald Jay Sussman and Jack Wisdom

The Role of Programming in the Formulation of Ideas

November 2002



Classical mechanics is deceptively simple. It is surprisingly easy to get the right answer with fallacious reasoning or without real understanding. To address this problem we use computational techniques to communicate a deeper understanding of Classical Mechanics. Computational algorithms are used to express the methods used in the analysis of dynamical phenomena. Expressing the methods in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.


Author[s]: Antonio Torralba and William T. Freeman

Properties and Applications of Shape Recipes

December 2002



In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself bear as much of the representational burden as possible. In many situations, scene and image are closely related and it is possible to find a functional relationship between them. The scene information can be represented in reference to the image where the functional specifies how to translate the image into the associated scene. We illustrate the use of this representation for encoding shape information. We show how this representation has appealing properties such as locality and slow variation across space and scale. These properties provide a way of improving shape estimates coming from other sources of information like stereo.


Author[s]: Erik B. Sudderth, Alexander T. Ihler, William T. Freeman and Alan S. Willsky

Nonparametric Belief Propagation and Facial Appearance Estimation

December 2002



In many applications of graphical models arising in computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. There exist inference algorithms for discrete approximations to these continuous distributions, but for the high-dimensional variables typically of interest, discrete inference becomes infeasible. Stochastic methods such as particle filters provide an appealing alternative. However, existing techniques fail to exploit the rich structure of the graphical models describing many vision problems. Drawing on ideas from regularized particle filters and belief propagation (BP), this paper develops a nonparametric belief propagation (NBP) algorithm applicable to general graphs. Each NBP iteration uses an efficient sampling procedure to update kernel-based approximations to the true, continuous likelihoods. The algorithm can accomodate an extremely broad class of potential functions, including nonparametric representations. Thus, NBP extends particle filtering methods to the more general vision problems that graphical models can describe. We apply the NBP algorithm to infer component interrelationships in a parts-based face model, allowing location and reconstruction of occluded features.


Author[s]: Jacob Beal

Leaderless Distributed Hierarchy Formation

December 2002



I present a system for robust leaderless organization of an amorphous network into hierarchical clusters. This system, which assumes that nodes are spatially embedded and can only talk to neighbors within a given radius, scales to networks of arbitrary size and converges rapidly. The amount of data stored at each node is logarithmic in the diameter of the network, and the hierarchical structure produces an addressing scheme such that there is an invertible relation between distance and address for any pair of nodes. The system adapts automatically to stopping failures, network partition, and reorganization.


Author[s]: Jake V. Bouvrie

Multiple Resolution Image Classification

December 2002



Binary image classifiction is a problem that has received much attention in recent years. In this paper we evaluate a selection of popular techniques in an effort to find a feature set/ classifier combination which generalizes well to full resolution image data. We then apply that system to images at one-half through one-sixteenth resolution, and consider the corresponding error rates. In addition, we further observe generalization performance as it depends on the number of training images, and lastly, compare the system's best error rates to that of a human performing an identical classification task given teh same set of test images.



Author[s]: Luis Perez-Breva and Osamu Yoshimi

Model Selection in Summary Evaluation

December 2002



A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.



Author[s]: Sayan Mukherjee, Partha Niyogi, Tomaso Poggio and Ryan Rifkin

Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization

December 2002 (revised July 2003)



Solutions of learning problems by Empirical Risk Minimization (ERM) need to be consistent, so that they may be predictive. They also need to be well- posed, so that they can be used robustly. We show that a statistical form of well-posedness, defined in terms of the key property of L-stability, is necessary and sufficient for consistency of ERM.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu