CSAIL Digital Archive - Artificial Intelligence
Author[s]: Amnon Shashua
Geometry and Photometry in 3D Visual Recognition
The report addresses the problem of visual recognition under two sources of variability: geometric and photometric. The geometric deals with the relation between 3D objects and their views under orthographic and perspective projection. The photometric deals with the relation between 3D matte objects and their images under changing illumination conditions. Taken together, an alignment- based method is presented for recognizing objects viewed from arbitrary viewing positions and illuminated by arbitrary settings of light sources.
Author[s]: Athanassios G. Siapas
A Global Approach to Parameter Estimation of Chaotic Dynamical Systems
We present a novel approach to parameter estimation of systems with complicated dynamics, as well as evidence for the existence of a universal power law that enables us to quantify the dependence of global geometry on small changes in the parameters of the system. This power law gives rise to what seems to be a new dynamical system invariant.
Author[s]: Gary C. Borchardt
Causal reconstruction is the task of reading a written causal description of a physical behavior, forming an internal model of the described activity, and demonstrating comprehension through question answering. T his task is difficult because written d escriptions often do not specify exactly how r eferenced events fit together. This article (1) ch aracterizes the causal reconstruction problem, (2) presents a representation called transition space, which portrays events in terms of "transitions,'' or collections of changes expressible in everyday language, and (3) describes a program called PATHFINDER, which uses the transition space representation to perform causal reconstruction on simplified English descriptions of physical activity.
Author[s]: Tomaso Poggio and Anya Hurlbert
Observations on Cortical Mechanisms for Object Recognition andsLearning
This paper sketches a hypothetical cortical architecture for visual 3D object recognition based on a recent computational model. The view-centered scheme relies on modules for learning from examples, such as Hyperbf-like networks. Such models capture a class of explanations we call Memory-Based Models (MBM) that contains sparse population coding, memory-based recognition, and codebooks of prototypes. Unlike the sigmoidal units of some artificial neural networks, the units of MBMs are consistent with the description of cortical neurons. We describe how an example of MBM may be realized in terms of cortical circuitry and biophysical mechanisms, consistent with psychophysical and physiological data.
Author[s]: Amnon Shashua
Geometric and Algebraic Aspects of 3D Affine and Projective Structures from Perspective 2D Views
We investigate the differences --- conceptually and algorithmically --- between affine and projective frameworks for the tasks of visual recognition and reconstruction from perspective views. It is shown that an affine invariant exists between any view and a fixed view chosen as a reference view. This implies that for tasks for which a reference view can be chosen, such as in alignment schemes for visual recognition, projective invariants are not really necessary. We then use the affine invariant to derive new algebraic connections between perspective views. It is shown that three perspective views of an object are connected by certain algebraic functions of image coordinates alone (no structure or camera geometry needs to be involved).
Author[s]: Philip Greenspun
Site Controller: A System for Computer-Aided Civil Engineering and Construction
A revolution in earthmoving, a $100 billion industry, can be achieved with three components: the GPS location system, sensors and computers in bulldozers, and SITE CONTROLLER, a central computer system that maintains design data and directs operations. The first two components are widely available; I built SITE CONTROLLER to complete the triangle and describe it here. SITE CONTROLLER assists civil engineers in the design, estimation, and construction of earthworks, including hazardous waste site remediation. The core of SITE CONTROLLER is a site modelling system that represents existing and prospective terrain shapes, roads, hydrology, etc. Around this core are analysis, simulation, and vehicle control tools. Integrating these modules into one program enables civil engineers and contractors to use a single interface and database throughout the life of a project.
Author[s]: Thomas Vetter, Tomaso Poggio and Heinrich B'ulthoff
3D Object Recognition: Symmetry and Virtual Views
Many 3D objects in the world around us are strongly constrained. For instance, not only cultural artifacts but also many natural objects are bilaterally symmetric. Thoretical arguments suggest and psychophysical experiments confirm that humans may be better in the recognition of symmetric objects. The hypothesis of symmetry-induced virtual views together with a network model that successfully accounts for human recognition of generic 3D objects leads to predictions that we have verified with psychophysical experiments.
Author[s]: Tao Daniel Alter
Robust and Efficient 3D Recognition by Alignment
Alignment is a prevalent approach for recognizing 3D objects in 2D images. A major problem with current implementations is how to robustly handle errors that propagate from uncertainties in the locations of image features. This thesis gives a technique for bounding these errors. The technique makes use of a new solution to the problem of recovering 3D pose from three matching point pairs under weak-perspective projection. Furthermore, the error bounds are used to demonstrate that using line segments for features instead of points significantly reduces the false positive rate, to the extent that alignment can remain reliable even in cluttered scenes.
Author[s]: Clay Matthew Thompson
Robust Photo-topography by Fusing Shape-from-Shading and Stereo
Methods for fusing two computer vision methods are discussed and several example algorithms are presented to illustrate the variational method of fusing algorithms. The example algorithms seek to determine planet topography given two images taken from two different locations with two different lighting conditions. The algorithms each employ assingle cost function that combines the computer vision methods of shape-from- shading and stereo in different ways. The algorithms are closely coupled and take into account all the constraints of the photo- topography problem. The algorithms are run on four synthetic test image sets of varying difficulty.
Author[s]: Jonathan Amsterdam
Automatic Qualitative Modeling of Dynamic Physical Systems
This report describes MM, a computer program that can model a variety of mechanical and fluid systems. Given a system's structure and qualitative behavior, MM searches for models using an energy- based modeling framework. MM uses general facts about physical systems to relate behavioral and model properties. These facts enable a more focussed search for models than would be obtained by mere comparison of desired and predicted behaviors. When these facts do not apply, MM uses behavior- constrained qualitative simulation to verify candidate models efficiently. MM can also design experiments to distinguish among multiple candidate models.
Author[s]: Andrew A. Berlin and Rajeev J. Surati
Exploiting the Parallelism Exposed by Partial Evaluation
We describe an approach to parallel compilation that seeks to harness the vast amount of fine-grain parallelism that is exposed through partial evaluation of numerically-intensive scientific programs. We have constructed a compiler for the Supercomputer Toolkit parallel processor that uses partial evaluation to break down data abstractions and program structure, producing huge basic blocks that contain large amounts of fine-grain parallelism. We show that this fine-grain prarllelism can be effectively utilized even on coarse-grain parallel architectures by selectively grouping operations together so as to adjust the parallelism grain-size to match the inter-processor communication capabilities of the target architecture.
Author[s]: Rajeev Surati
Exploiting the Parallelism Exposed by Partial Evaluation
We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
Author[s]: Pawan Sinha
Pattern Motion Perception: Feature Tracking or Integration of Component Motions?
A key question regarding primate visual motion perception is whether the motion of 2D patterns is recovered by tracking distinctive localizable features [Lorenceau and Gorea, 1989; Rubin and Hochstein, 1992] or by integrating ambiguous local motion estimates [Adelson and Movshon, 1982; Wilson and Kim, 1992]. For a two-grating plaid pattern, this translates to either tracking the grating intersections or to appropriately combining the motion estimates for each grating. Since both component and feature information are simultaneously available in any plaid pattern made of contrast defined gratings, it is unclear how to determine which of the two schemes is actually used to recover the plaid"s motion. To address this problem, we have designed a plaid pattern made with subjective, rather than contrast defined, gratings. The distinguishing characteristic of such a plaid pattern is that it contains no contrast defined intersections that may be tracked. We find that notwithstanding the absence of such features, observers can accurately recover the pattern velocity. Additionally we show that the hypothesis of tracking "illusory features" to estimate pattern motion does not stand up to experimental test. These results present direct evidence in support of the idea that calls for the integration of component motions over the one that mandates tracking localized features to recover 2D pattern motion. The localized features, we suggest, are used primarily as providers of grouping information - which component motion signals to integrate and which not to.
Author[s]: David W. Jacobs
Recognizing 3-D Objects Using 2-D Images
We discuss a strategy for visual recognition by forming groups of salient image features, and then using these groups to index into a data base to find all of the matching groups of model features. We discuss the most space efficient possible method of representing 3-D models for indexing from 2-D data, and show how to account for sensing error when indexing. We also present a convex grouping method that is robust and efficient, both theoretically and in practice. Finally, we combine these modules into a complete recognition system, and test its performance on many real images.
Author[s]: Patrick Sobalvarro
A Lifetime-based Garbage Collector for LISP Systems on General-Purpose Computers
Garbage collector performance in LISP systems on custom hardware has been substantially improved by the adoption of lifetime-based garbage collection techniques. To date, however, successful lifetime-based garbage collectors have required special- purpose hardware, or at least privileged access to data structures maintained by the virtual memory system. I present here a lifetime-based garbage collector requiring no special-purpose hardware or virtual memory system support, and discuss its performance.
Author[s]: S. Tanveer F. Mahmood
Attentional Selection in Object Recognition
A key problem in object recognition is selection, namely, the problem of identifying regions in an image within which to start the recognition process, ideally by isolating regions that are likely to come from a single object. Such a selection mechanism has been found to be crucial in reducing the combinatorial search involved in the matching stage of object recognition. Even though selection is of help in recognition, it has largely remained unsolved because of the difficulty in isolating regions belonging to objects under complex imaging conditions involving occlusions, changing illumination, and object appearances. This thesis presents a novel approach to the selection problem by proposing a computational model of visual attentional selection as a paradigm for selection in recognition. In particular, it proposes two modes of attentional selection, namely, attracted and pay attention modes as being appropriate for data and model-driven selection in recognition. An implementation of this model has led to new ways of extracting color, texture and line group information in images, and their subsequent use in isolating areas of the scene likely to contain the model object. Among the specific results in this thesis are: a method of specifying color by perceptual color categories for fast color region segmentation and color-based localization of objects, and a result showing that the recognition of texture patterns on model objects is possible under changes in orientation and occlusions without detailed segmentation. The thesis also presents an evaluation of the proposed model by integrating with a 3D from 2D object recognition system and recording the improvement in performance. These results indicate that attentional selection can significantly overcome the computational bottleneck in object recognition, both due to a reduction in the number of features, and due to a reduction in the number of matches during recognition using the information derived during selection. Finally, these studies have revealed a surprising use of selection, namely, in the partial solution of the pose of a 3D object.
Author[s]: Brian Eberman and S. Kenneth Salisbury
Application of Charge Detection to Dynamic Contact Sensing
The manipulation contact forces convey substantial information about the manipulation state. This paper address the fundamental problem of interpreting the force signals without any additional manipulation context. Techniques based on forms of the generalized sequential likelihood ratio test are used to segment individual strain signals into statistically equivalent pieces. We report on our experimental development of the segmentation algorithm and on its results for contact states. The sequential likelihood ratio test is reviewed and some of its special cases and optimal properties are discussed. Finally, we conclude by discussing extensions to the techniques and a contact interpretation framework.
Author[s]: Henry M. Wu
A Method for Eliminating Skew Introduced by Non-Uniform Buffer Delay and Wire Lengths in Clock Distribution Trees
The computation of a piecewise smooth function that approximates a finite set of data points is decomposed into two decoupled tasks: first, the computation of the locally smooth models, and hence, the segmentation of the data into classes that consist on the sets of points best approximated by each model, and second, the computation of the normalized discriminant functions for each induced class. The approximating function is then computed as the optimal estimator with respect to this measure field. Applications to image processing and time series prediction are presented as well.
Author[s]: Neil C. Singer and Warren P. Seering
A Simplified Method for Deriving Equations of Motion For Continuous Systems with Flexible Members
A method is proposed for deriving dynamical equations for systems with both rigid and flexible components. During the derivation, each flexible component of the system is represented by a "surrogate element" which captures the response characteristics of that component and is easy to mathematically manipulate. The derivation proceeds essentially as if each surrogate element were a rigid body. Application of an extended form of Lagrange's equation yields a set of simultaneous differential equations which can then be transformed to be the exact, partial differential equations for the original flexible system. This method's use facilitates equation generation either by an analyst or through application of software-based symbolic manipulation.
Author[s]: Charles L. Isbell
Explorations of the Practical Issues of Learning Prediction-Control Tasks Using Temporal Difference Learning Methods
There has been recent interest in using temporal difference learning methods to attack problems of prediction and control. While these algorithms have been brought to bear on many problems, they remain poorly understood. It is the purpose of this thesis to further explore these algorithms, presenting a framework for viewing them and raising a number of practical issues and exploring those issues in the context of several case studies. This includes applying the TD(lambda) algorithm to: 1) learning to play tic-tac-toe from the outcome of self-play and of play against a perfectly-playing opponent and 2) learning simple one-dimensional segmentation tasks.
Author[s]: Michael E. Caine
The Design of Shape from Motion Constraints
This report presents a set of representations methodologies and tools for the purpose of visualizing, analyzing and designing functional shapes in terms of constraints on motion. The core of the research is an interactive computational environment that provides an explicit visual representation of motion constraints produced by shape interactions, and a series of tools that allow for the manipulation of motion constraints and their underlying shapes for the purpose of design.
Author[s]: Gideon P. Stein
Internal Camera Calibration Using Rotation and Geometric Shapes
This paper describes a simple method for internal camera calibration for computer vision. This method is based on tracking image features through a sequence of images while the camera undergoes pure rotation. The location of the features relative to the camera or to each other need not be known and therefore this method can be used both for laboratory calibration and for self calibration in autonomous robots working in unstructured environments. A second method of calibration is also presented. This method uses simple geometric objects such as spheres and straight lines to The camera parameters. Calibration is performed using both methods and the results compared.
Author[s]: Guillermo J. Rozas
Translucent Procedures, Abstraction without Opacity
This report introduces TRANSLUCENT PROCEDURES as a new mechanism for implementing behavioral abstractions. Like an ordinary procedure, a translucent procedure can be invoked, and thus provides an obvious way to capture a BEHAVIOR. Translucent procedures, like ordinary procedures, can be manipulated as first-class objects and combined using functional composition. But unlike ordinary procedures, translucent procedures have structure that can be examined in well-specified non- destructive ways, without invoking the procedure.
Author[s]: Christine L. Tsien
Maygen: A Symbolic Debugger Generation System
With the development of high-level languages for new computer architectures comes the need for appropriate debugging tools as well. One method for meeting this need would be to develop, from scratch, a symbolic debugger with the introduction of each new language implementation for any given architecture. This, however, seems to require unnecessary duplication of effort among developers. This paper describes Maygen, a "debugger generation system," designed to efficiently provide the desired language-dependent and architecture-dependent debuggers. A prototype of the Maygen system has been implemented and is able to handle the semantically different languages of C and OPAL.
Author[s]: Federico Girosi, Michael Jones and Tomaso Poggio
Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines
We had previously shown that regularization principles lead to approximation schemes, as Radial Basis Functions, which are equivalent to networks with one layer of hidden units, called Regularization Networks. In this paper we show that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models, Breiman's hinge functions and some forms of Projection Pursuit Regression. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In the final part of the paper, we also show a relation between activation functions of the Gaussian and sigmoidal type.
Author[s]: David Beymer, Amnon Shashua and Tomaso Poggio
Example Based Image Analysis and Synthesis
Image analysis and graphics synthesis can be achieved with learning techniques using directly image examples without physically- based, 3D models. In our technique: -- the mapping from novel images to a vector of "pose" and "expression" parameters can be learned from a small set of example images using a function approximation technique that we call an analysis network; -- the inverse mapping from input "pose" and "expression" parameters to output images can be synthesized from a small set of example images and used to produce new images using a similar synthesis network. The techniques described here have several applications in computer graphics, special effects, interactive multimedia and very low bandwidth teleconferencing.
Author[s]: Philippe G. Schyns and Heinrich H. Bulthoff
Conditions for Viewpoint Dependent Face Recognition
Poggio and Vetter (1992) showed that learning one view of a bilaterally symmetric object could be sufficient for its recognition, if this view allows the computation of a symmetric, "virtual," view. Faces are roughly bilaterally symmetric objects. Learning a side- view--which always has a symmetric view-- should allow for better generalization performances than learning the frontal view. Two psychophysical experiments tested these predictions. Stimuli were views of shaded 3D models of laser-scanned faces. The first experiment tested whether a particular view of a face was canonical. The second experiment tested which single views of a face give rise to best generalization performances. The results were compatible with the symmetry hypothesis: Learning a side view allowed better generalization performances than learning the frontal view.
Author[s]: Jose L. Marroquin
Measure Fields for Function Approximation
The computation of a piecewise smooth function that approximates a finite set of data points may be decomposed into two decoupled tasks: first, the computation of the locally smooth models, and hence, the segmentation of the data into classes that consist on the sets of points best approximated by each model, and second, the computation of the normalized discriminant functions for each induced class. The approximating function may then be computed as the optimal estimator with respect to this measure field. We give an efficient procedure for effecting both computations, and for the determination of the optimal number of components.
Author[s]: Ronald D. Chaney
Feature Extraction Without Edge Detection
Information representation is a critical issue in machine vision. The representation strategy in the primitive stages of a vision system has enormous implications for the performance in subsequent stages. Existing feature extraction paradigms, like edge detection, provide sparse and unreliable representations of the image information. In this thesis, we propose a novel feature extraction paradigm. The features consist of salient, simple parts of regions bounded by zero-crossings. The features are dense, stable, and robust. The primary advantage of the features is that they have abstract geometric attributes pertaining to their size and shape. To demonstrate the utility of the feature extraction paradigm, we apply it to passive navigation. We argue that the paradigm is applicable to other early vision problems.
Author[s]: W. Eric L. Grimson
Why Stereo Vision is Not Always About 3D Reconstruction
It is commonly assumed that the goal of stereovision is computing explicit 3D scene reconstructions. We show that very accurate camera calibration is needed to support this, and that such accurate calibration is difficult to achieve and maintain. We argue that for tasks like recognition, figure/ground separation is more important than 3D depth reconstruction, and demonstrate a stereo algorithm that supports figure/ground separation without 3D reconstruction.
Author[s]: Reza Shadmehr and Ferdinando Mussa-Ivaldi
Geometric Structure of the Adaptive Controller of the Human Arm
The objects with which the hand interacts with may significantly change the dynamics of the arm. How does the brain adapt control of arm movements to this new dynamic? We show that adaptation is via composition of a model of the task's dynamics. By exploring generalization capabilities of this adaptation we infer some of the properties of the computational elements with which the brain formed this model: the elements have broad receptive fields and encode the learned dynamics as a map structured in an intrinsic coordinate system closely related to the geometry of the skeletomusculature. The low- -level nature of these elements suggests that they may represent asset of primitives with which a movement is represented in the CNS.
Author[s]: Kah Kay Sung and Partha Niyogi
A Formulation for Active Learning with Applications to Object Detection
June 6, 1996
We discuss a formulation for active example selection for function learning problems. This formulation is obtained by adapting Fedorov's optimal experiment design to the learning problem. We specifically show how to analytically derive example selection algorithms for certain well defined function classes. We then explore the behavior and sample complexity of such active learning algorithms. Finally, we view object detection as a special case of function learning and show how our formulation reduces to a useful heuristic to choose examples to reduce the generalization error.
Author[s]: Rodney Brooks and Lynn A. Stein
Building Brains for Bodies
We describe a project to capitalize on newly available levels of computational resources in order to understand human cognition. We will build an integrated physical system including vision, sound input and output, and dextrous manipulation, all controlled by a continuously operating large scale parallel MIMD computer. The resulting system will learn to "think'' by building on its bodily experiences to accomplish progressively more abstract tasks. Past experience suggests that in attempting to build such an integrated system we will have to fundamentally change the way artificial intelligence, cognitive science, linguistics, and philosophy think about the organization of intelligence. We expect to be able to better reconcile the theories that will be developed with current work in neuroscience.
Author[s]: Michael I. Jordan and Robert A. Jacobs
Hierarchical Mixtures of Experts and the EM Algorithm
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation- Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.
Author[s]: Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP- based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
Author[s]: J. Brian Subirana-Vilanova
Mid-Level Vision and Recognition of Non-Rigid Objects
We address mid-level vision for the recognition of non-rigid objects. We align model and image using frame curves - which are object or "figure/ground" skeletons. Frame curves are computed, without discontinuities, using Curved Inertia Frames, a provably global scheme implemented on the Connection Machine, based on: non- cartisean networks; a definition of curved axis of inertia; and a ridge detector. I present evidence against frame alignment in human perception. This suggests: frame curves have a role in figure/ground segregation and in fuzzy boundaries; their outside/near/top/ incoming regions are more salient; and that perception begins by setting a reference frame (prior to early vision), and proceeds by processing convex structures.
Author[s]: Cynthia Ferrell
Robust Agent Control of an Autonomous Robot with Many Sensors and Actuators
This thesis presents methods for implementing robust hexpod locomotion on an autonomous robot with many sensors and actuators. The controller is based on the Subsumption Architecture and is fully distributed over approximately 1500 simple, concurrent processes. The robot, Hannibal, weighs approximately 6 pounds and is equipped with over 100 physical sensors, 19 degrees of freedom, and 8 on board computers. We investigate the following topics in depth: distributed control of a complex robot, insect-inspired locomotion control for gait generation and rough terrain mobility, and fault tolerance. The controller was implemented, debugged, and tested on Hannibal. Through a series of experiments, we examined Hannibal's gait generation, rough terrain locomotion, and fault tolerance performance. These results demonstrate that Hannibal exhibits robust, flexible, real-time locomotion over a variety of terrain and tolerates a multitude of hardware failures.
Author[s]: Michael de la Maza
Synthesizing Regularity Exposing Attributes in Large Protein Databases
This thesis describes a system that synthesizes regularity exposing attributes from large protein databases. After processing primary and secondary structure data, this system discovers an amino acid representation that captures what are thought to be the three most important amino acid characteristics (size, charge, and hydrophobicity) for tertiary structure prediction. A neural network trained using this 16 bit representation achieves a performance accuracy on the secondary structure prediction problem that is comparable to the one achieved by a neural network trained using the standard 24 bit amino acid representation. In addition, the thesis describes bounds on secondary structure prediction accuracy, derived using an optimal learning algorithm and the probably approximately correct (PAC) model.
Author[s]: Andre DeHon
Robust, High-Speed Network Design for Large-Scale Multiprocessing
As multiprocessor system size scales upward, two important aspects of multiprocessor systems will generally get worse rather than better: (1) interprocessor communication latency will increase and (2) the probability that some component in the system will fail will increase. These problems can prevent us from realizing the potential benefits of large-scale multiprocessing. In this report we consider the problem of designing networks which simultaneously minimize communication latency while maximizing fault tolerance. Using a synergy of techniques including connection topologies, routing protocols, signalling techniques, and packaging technologies we assemble integrated, system-level solutions to this network design problem.
Author[s]: Takaya Miyano and Federico Girosi
Forecasting Global Temperature Variations by Neural Networks
Global temperature variations between 1861 and 1984 are forecast usingsregularization networks, multilayer perceptrons and linearsautoregression. The regularization network, optimized by stochasticsgradient descent associated with colored noise, gives the bestsforecasts. For all the models, prediction errors noticeably increasesafter 1965. These results are consistent with the hypothesis that thesclimate dynamics is characterized by low-dimensional chaos and thatsthe it may have changed at some point after 1965, which is alsosconsistent with the recent idea of climate change.s
Author[s]: Amnon Shashua and Sebastian Toelg
The Quadric Reference Surface: Theory and Applications
The conceptual component of this work is about "reference surfaces'' which are the dual of reference frames often used for shape representation purposes. The theoretical component of this work involves the question of whether one can find a unique (and simple) mapping that aligns two arbitrary perspective views of an opaque textured quadric surface in 3D, given (i) few corresponding points in the two views, or (ii) the outline conic of the surface in one view (only) and few corresponding points in the two views. The practical component of this work is concerned with applying the theoretical results as tools for the task of achieving full correspondence between views of arbitrary objects.
Author[s]: Partha Niyogi and Robert C. Berwick
Formalizing Triggers: A Learning Model for Finite Spaces
In a recent seminal paper, Gibson and Wexler (1993) take important steps to formalizing the notion of language learning in a (finite) space whose grammars are characterized by a finite number of parameters. They introduce the Triggering Learning Algorithm (TLA) and show that even in finite space convergence may be a problem due to local maxima. In this paper we explicitly formalize learning in finite parameter space as a Markov structure whose states are parameter settings. We show that this captures the dynamics of TLA completely and allows us to explicitly compute the rates of convergence for TLA and other variants of TLA e.g. random walk. Also included in the paper are a corrected version of GW's central convergence proof, a list of "problem states" in addition to local maxima, and batch and PAC-style learning bounds for the model.
Author[s]: Daniel M. Albro
AMAR: A Computational Model of Autosegmental Phonology
This report describes a computational system with which phonologists may describe a natural language in terms of autosegmental phonology, currently the most advanced theory pertaining to the sound systems of human languages. This system allows linguists to easily test autosegmental hypotheses against a large corpus of data. The system was designed primarily with tonal systems in mind, but also provides support for tree or feature matrix representation of phonemes (as in The Sound Pattern of English), as well as syllable structures and other aspects of phonological theory. Underspecification is allowed, and trees may be specified before, during, and after rule application. The association convention is automatically applied, and other principles such as the conjunctivity condition are supported. The method of representation was designed such that rules are designated in as close a fashion as possible to the existing conventions of autosegmental theory while adhering to a textual constraint for maximum portability.
Author[s]: Matthew Birkholz
Emacs Lisp in Edwin SScheme
The MIT-Scheme program development environment includes a general-purpose text editor, Edwin, that has an extension language, Edwin Scheme. Edwin is very similar to another general-purpose text editor, GNU Emacs, which also has an extension language, Emacs Lisp. The popularity of GNU Emacs has lead to a large library of tools written in Emacs Lisp. The goal of this thesis is to implement a useful subset of Emacs Lisp in Edwin Scheme. This subset was chosen to be sufficient for simple operation of the GNUS news reading program.
Author[s]: Ammon Shashua
Algebraic Functions For Recognition
In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment --- yielding a direct method that cuts through the computations of camera transformation, scene structure and epipolar geometry. The proof of the central result may be of further interest as it demonstrates certain regularities across homographies of the plane and introduces new view invariants. Experiments on simulated and real image data were conducted, including a comparative analysis with epipolar intersection and the linear combination methods, with results indicating a greater degree of robustness in practice and a higher level of performance in re-projection tasks.
Author[s]: Carl de Marcken
Methods for Parallelizing Search Paths in Phrasing
Many search problems are commonly solved with combinatoric algorithms that unnecessarily duplicate and serialize work at considerable computational expense. There are techniques available that can eliminate redundant computations and perform remaining operations concurrently, effectively reducing the branching factors of these algorithms. This thesis applies these techniques to the problem of parsing natural language. The result is an efficient programming language that can reduce some of the expense associated with principle- based parsing and other search problems. The language is used to implement various natural language parsers, and the improvements are compared to those that result from implementing more deterministic theories of language processing.
Author[s]: Martha J. Hiller
The Role of Chemical Mechanisms in Neural Computation and Learning
May 23, 1995
Most computational models of neurons assume that their electrical characteristics are of paramount importance. However, all long- term changes in synaptic efficacy, as well as many short-term effects, are mediated by chemical mechanisms. This technical report explores the interaction between electrical and chemical mechanisms in neural learning and development. Two neural systems that exemplify this interaction are described and modelled. The first is the mechanisms underlying habituation, sensitization, and associative learning in the gill withdrawal reflex circuit in Aplysia, a marine snail. The second is the formation of retinotopic projections in the early visual pathway during embryonic development.
Author[s]: Jeffrey M. Siskind
Naive Physics, Event Perception, Lexical Semantics, and Language Acquisition
This thesis proposes a computational model of how children may come to learn the meanings of words in their native language. The proposed model is divided into two separate components. One component produces semantic descriptions of visually observed events while the other correlates those descriptions with co-occurring descriptions of those events in natural language. The first part of this thesis describes three implementations of the correlation process whereby representations of the meanings of whole utterances can be decomposed into fragments assigned as representations of the meanings of individual words. The second part of this thesis describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input.
Author[s]: James M. Hutchinson
A Radial Basis Function Approach to Financial Time Series Analysis
Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.
Author[s]: Michael I. Jordan and Lei Xu
Convergence Results for the EM Approach to Mixtures of Experts Architectures
The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs (1993) recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Author[s]: Peter R. Nuth
The Named-State Register File
This thesis introduces the Named-State Register File, a fine-grain, fully-associative register file. The NSF allows fast context switching between concurrent threads as well as efficient sequential program performance. The NSF holds more live data than conventional register files, and requires less spill and reload traffic to switch between contexts. This thesis demonstrates an implementation of the Named-State Register File and estimates the access time and chip area required for different organizations. Architectural simulations of large sequential and parallel applications show that the NSF can reduce execution time by 9% to 17% compared to alternative register files.
Author[s]: David J. Beymer
Face Recognition Under Varying Pose
While researchers in computer vision and pattern recognition have worked on automatic techniques for recognizing faces for the last 20 years, most systems specialize on frontal views of the face. We present a face recognizer that works under varying pose, the difficult part of which is to handle face rotations in depth. Building on successful template-based systems, our basic approach is to represent faces with templates from multiple model views that cover different poses from the viewing sphere. Our system has achieved a recognition rate of 98% on a data base of 62 people containing 10 testing and 15 modelling views per person.
Author[s]: James S. Miller and Guillermo J. Rozas
Garbage Collection is Fast, But a Stack is Faster
Prompted by claims that garbage collection can outperform stack allocation when sufficient physical memory is available, we present a careful analysis and set of cross-architecture measurements comparing these two approaches for the implementation of continuation (procedure call) frames. When the frames are allocated on a heap they require additional space, increase the amount of data transferred between memory and registers, and, on current architectures, require more instructions. We find that stack allocation of continuation frames outperforms heap allocation in some cases by almost a factor of three. Thus, stacks remain an important implementation technique for procedure calls, even in the presence of an efficient, compacting garbage collector and large amounts of memory.
Author[s]: Kanji Nagao and W. Eric L. Grimson
Object Recognition By Alignment Using Invariant Projections of Planar Surfaces
In order to recognize an object in an image, we must determine the best transformation from object model to the image. In this paper, we show that for features from coplanar surfaces which undergo linear transformations in space, there exist projections invariant to the surface motions up to rotations in the image field. To use this property, we propose a new alignment approach to object recognition based on centroid alignment of corresponding feature groups. This method uses only a single pair of 2D model and data. Experimental results show the robustness of the proposed method against perturbations of feature positions.
Author[s]: Nancy S. Pollard
Parallel Methods for Synthesizing Whole-Hand Grasps from Generalized Prototypes
This report addresses the problem of acquiring objects using articulated robotic hands. Standard grasps are used to make the problem tractable, and a technique is developed for generalizing these standard grasps to increase their flexibility to variations in the problem geometry. A generalized grasp description is applied to a new problem situation using a parallel search through hand configuration space, and the result of this operation is a global overview of the space of good solutions. The techniques presented in this report have been implemented, and the results are verified using the Salisbury three- finger robotic hand.
Author[s]: Lynne E. Parker
Heterogeneous Multi-Robot Cooperation
This report addresses the problem of achieving cooperation within small- to medium- sized teams of heterogeneous mobile robots. I describe a software architecture I have developed, called ALLIANCE, that facilitates robust, fault tolerant, reliable, and adaptive cooperative control. In addition, an extended version of ALLIANCE, called L-ALLIANCE, is described, which incorporates a dynamic parameter update mechanism that allows teams of mobile robots to improve the efficiency of their mission performance through learning. A number of experimental results of implementing these architectures on both physical and simulated mobile robot teams are described. In addition, this report presents the results of studies of a number of issues in mobile robot cooperation, including fault tolerant cooperative control, adaptive action selection, distributed control, robot awareness of team member actions, improving efficiency through learning, inter- robot communication, action recognition, and local versus global control.
Author[s]: Partha Niyogi and Federico Girosi
On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions
In this paper, we bound the generalization error of a class of Radial Basis Function networks, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of its finite size) and partly due to insufficient information about the target function (because of finite number of samples). We make several observations about generalization error which are valid irrespective of the approximation scheme. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem.
Author[s]: Whitman Richards and Jan J. Koenderink
Trajectory Mapping ("TM''): A New Non-Metric Scaling Technique
Trajectory Mapping "TM'' is a new scaling technique designed to recover the parameterizations, axes, and paths used to traverse a feature space. Unlike Multidimensional Scaling (MDS), there is no assumption that the space is homogenous or metric. Although some metric ordering information is obtained with TM, the main output is the feature parameterizations that partition the given domain of object samples into different categories. Following an introductory example, the technique is further illustrated using first a set of colors and then a collection of textures taken from Brodatz (1966).
Author[s]: Karen Beth Sarachik
An Analysis of the Effect of Gaussian Error in Object Recognition
Object recognition is complicated by clutter, occlusion, and sensor error. Since pose hypotheses are based on image feature locations, these effects can lead to false negatives and positives. In a typical recognition algorithm, pose hypotheses are tested against the image, and a score is assigned to each hypothesis. We use a statistical model to determine the score distribution associated with correct and incorrect pose hypotheses, and use binary hypothesis testing techniques to distinguish between them. Using this approach we can compare algorithms and noise models, and automatically choose values for internal system thresholds to minimize the probability of making a mistake.
Author[s]: James M. Hutchinson, Andrew Lo and Tomaso Poggio
A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks
We propose a nonparametric method for estimating derivative financial asset pricing formulae using learning networks. To demonstrate feasibility, we first simulate Black-Scholes option prices and show that learning networks can recover the Black- Scholes formula from a two-year training set of daily options prices, and that the resulting network formula can be used successfully to both price and delta-hedge options out-of- sample. For comparison, we estimate models using four popular methods: ordinary least squares, radial basis functions, multilayer perceptrons, and projection pursuit. To illustrate practical relevance, we also apply our approach to S&P 500 futures options data from 1987 to 1991.
Author[s]: Nikos K. Logothetis, Thomas Vetter, Anya Hurlbert and Tomaso Poggio
View-Based Models of 3D Object Recognition and Class-Specific Invariances
This paper describes the main features of a view-based model of object recognition. The model tries to capture general properties to be expected in a biological architecture for object recognition. The basic module is a regularization network in which each of the hidden units is broadly tuned to a specific view of the object to be recognized.
Author[s]: N.K. Logothetis, J. Pauls and T. Poggio
Viewer-Centered Object Recognition in Monkeys
How does the brain recognize three-dimensional objects? We trained monkeys to recognize computer rendered objects presented from an arbitrarily chosen training view, and subsequently tested their ability to generalize recognition for other views. Our results provide additional evidence in favor of with a recognition model that accomplishes view-invariant performance by storing a limited number of object views or templates together with the capacity to interpolate between the templates (Poggio and Edelman, 1990).
Author[s]: Margrit Betke, Ronald L. Rivest and Mona Singh
Piecemeal Learning of an Unknown Environment
We introduce a new learning problem: learning a graph by piecemeal search, in which the learner must return every so often to its starting point (for refueling, say). We present two linear-time piecemeal-search algorithms for learning city-block graphs: grid graphs with rectangular obstacles.
Author[s]: D.W. Jacobs and T.D. Alter
Uncertainty Propagation in Model-Based Recognition
Building robust recognition systems requires a careful understanding of the effects of error in sensed features. Error in these image features results in a region of uncertainty in the possible image location of each additional model feature. We present an accurate, analytic approximation for this uncertainty region when model poses are based on matching three image and model points, for both Gaussian and bounded error in the detection of image points, and for both scaled-orthographic and perspective projection models. This result applies to objects that are fully three- dimensional, where past results considered only two- dimensional objects. Further, we introduce a linear programming algorithm to compute the uncertainty region when poses are based on any number of initial matches. Finally, we use these results to extend, from two-dimensional to three- dimensional objects, robust implementations of alignmentt interpretation- tree search, and ransformation clustering.
Author[s]: Heinrich H. Buelthoff, Shimon Y. Edelman and Michael J. Tarr
How are Three-Deminsional Objects Represented in the Brain?
We discuss a variety of object recognition experiments in which human subjects were presented with realistically rendered images of computer-generated three-dimensional objects, with tight control over stimulus shape, surface properties, illumination, and viewpoint, as well as subjects' prior exposure to the stimulus objects. In all experiments recognition performance was: (1) consistently viewpoint dependent; (2) only partially aided by binocular stereo and other depth information, (3) specific to viewpoints that were familiar; (4) systematically disrupted by rotation in depth more than by deforming the two-dimensional images of the stimuli. These results are consistent with recently advanced computational theories of recognition based on view interpolation.
Author[s]: Panayotis A. Skordos
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations
An effective approach of simulating fluid dynamics on a cluster of non- dedicated workstations is presented. The approach uses local interaction algorithms, small communication capacity, and automatic migration of parallel processes from busy hosts to free hosts. The approach is well- suited for simulating subsonic flow problems which involve both hydrodynamics and acoustic waves; for example, the flow of air inside wind musical instruments. Typical simulations achieve $80\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed measurements of the parallel efficiency of 2D and 3D simulations are presented, and a theoretical model of efficiency is developed which fits closely the measurements. Two numerical methods of fluid dynamics are tested: explicit finite differences, and the lattice Boltzmann method.
Author[s]: Andrew Berlin and Rajeev Surati
Partial Evaluation for Scientific Computing: The Supercomputer Toolkit Experience
We describe the key role played by partial evaluation in the Supercomputer Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputer Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at M.I.T., and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
Author[s]: Leon Wong
Automated Reasoning About Classical Mechanics
In recent years, researchers in artificial intelligence have become interested in replicating human physical reasoning talents in computers. One of the most important skills in this area is predicting how physical systems will behave. This thesis discusses an implemented program that generates algebraic descriptions of how systems of rigid bodies evolve over time. Discussion about the design of this program identifies a physical reasoning paradigm and knowledge representation approach based on mathematical model construction and algebraic reasoning. This paradigm offers several advantages over methods that have become popular in the field, and seems promising for reasoning about a wide variety of classical mechanics problems.
Author[s]: Ammon Shashua and Nassir Navab
Relative Affine Structure: Canonical Model for 3D from 2D Geometry and Applications
We propose an affine framework for perspective views, captured by a single extremely simple equation based on a viewer- centered invariant we call "relative affine structure". Via a number of corollaries of our main results we show that our framework unifies previous work --- including Euclidean, projective and affine --- in a natural and simple way, and introduces new, extremely simple, algorithms for the tasks of reconstruction from multiple views, recognition by alignment, and certain image coding applications.
Author[s]: David A. Cohn
Neural Network Exploration Using Optimal Experiment Design
We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov  and MacKay , we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely. We conclude that, while not a panacea, OED-based query/action has much to offer, especially in domains where its high computational costs can be tolerated.
Author[s]: John S. Keen
Logging and Recovery in a Highly Concurrent Database
This report addresses the problem of fault tolerance to system failures for database systems that are to run on highly concurrent computers. It assumes that, in general, an application may have a wide distribution in the lifetimes of its transactions. Logging remains the method of choice for ensuring fault tolerance. Generational garbage collection techniques manage the limited disk space reserved for log information; this technique does not require periodic checkpoints and is well suited for applications with a broad range of transaction lifetimes. An arbitrarily large collection of parallel log streams provide the necessary disk bandwidth.
Author[s]: Michael H. Coen
SodaBot: A Software Agent Environment and Construction System
Nov 2, 1994
This thesis presents SodaBot, a general- purpose software agent user-environment and construction system. Its primary component is the basic software agent --- a computational framework for building agents which is essentially an agent operating system. We also present a new language for programming the basic software agent whose primitives are designed around human-level descriptions of agent activity. Via this programming language, users can easily implement a wide-range of typical software agent applications, e.g. personal on-line assistants and meeting scheduling agents. The SodaBot system has been implemented and tested, and its description comprises the bulk of this thesis.
Author[s]: Sebastian Toleg and Tomaso Poggio
Towards an Example-Based Image Compression Architecture for Video-Conferencing
This paper consists of two major parts. First, we present the outline of a simple approach to very-low bandwidth video-conferencing system relying on an example-based hierarchical image compression scheme. In particular, we discuss the use of example images as a model, the number of required examples, faces as a class of semi-rigid objects, a hierarchical model based on decomposition into different time-scales, and the decomposition of face images into patches of interest. In the second part, we present several algorithms for image processing and animation as well as experimental evaluations. Among the original contributions of this paper is an automatic algorithm for pose estimation and normalization. We also review and compare different algorithms for finding the nearest neighbors in a database for a new input as well as a generalized algorithm for blending patches of interest in order to synthesize new images. Finally, we outline the possible integration of several algorithms to illustrate a simple model-based video-conference system.
Author[s]: Maja J. Mataric
Interaction and Intelligent Behavior
We introduce basic behaviors as primitives for control and learning in situated, embodied agents interacting in complex domains. We propose methods for selecting, formally specifying, algorithmically implementing, empirically evaluating, and combining behaviors from a basic set. We also introduce a general methodology for automatically constructing higher--level behaviors by learning to select from this set. Based on a formulation of reinforcement learning using conditions, behaviors, and shaped reinforcement, out approach makes behavior selection learnable in noisy, uncertain environments with stochastic dynamics. All described ideas are validated with groups of up to 20 mobile robots performing safe-- wandering, following, aggregation, dispersion, homing, flocking, foraging, and learning to forage.
Author[s]: Lisa Dron
Computing 3-D Motion in Custom Analog and Digital VLSI
Nov 28, 1994
This thesis examines a complete design framework for a real-time, autonomous system with specialized VLSI hardware for computing 3-D camera motion. In the proposed architecture, the first step is to determine point correspondences between two images. Two processors, a CCD array edge detector and a mixed analog/digital binary block correlator, are proposed for this task. The report is divided into three parts. Part I covers the algorithmic analysis; part II describes the design and test of a 32$ ime $32 CCD edge detector fabricated through MOSIS; and part III compares the design of the mixed analog/digital correlator to a fully digital implementation.
Author[s]: Roberto Brunelli
Estimation of Pose and Illuminant Direction for Face Processing
In this paper three problems related to the analysis of facial images are addressed: the illuminant direction, the compensation of illumination effects and, finally, the recovery of the pose of the face, restricted to in-depth rotations. The solutions proposed for these problems rely on the use of computer graphics techniques to provide images of faces under different illumination and pose, starting from a database of frontal views under frontal illumination.