CSAIL Digital Archive - Artificial Intelligence
Author[s]: John M. Hollerbach
Dynamic Scaling of Manipulator Trajectories
A fundamental time-scaling property of manipulator dynamics has been identified that allows modification of movement speed without complete dynamics recalculation. By exploiting this property, it can be determined whether a planned trajectory is dynamically realizable given actuator torque limits, and if not, how to modify the trajectory to bring to bring it within dynamic an actuating constraints.
Author[s]: John Batali
Introspection is the process of thinking about one’s own thoughts and feelings. In this paper, I discuss recent attempts to make computational systems that exhibit introspective behavior: [Smith, 982], [Weyhrauch, 1978], and [Doyle, 1980]. Each presents a system capable of manipulating representations of its own program and current context. I argue that introspective ability is crucial for intelligent systems – without it an agent cannot represent certain problems that it must be able to solve. A theory of intelligent action would describe how and why certain actions intelligently achieve an agent’s goals. The agent would both embody and represent this theory; it would be implemented as the program for the agent; and the importance of introspection suggests that the agent represent its theory of action to itself.
Author[s]: Reid G. Simmons and Randall Davis
Representations for Reasoning About Change
This paper explores representations used to reason about objects which change over time and the processes which cause changes. Specifically, we are interested in solving a problem known as geologic interpretation. To help solve this problem, we have developed a simulation technique, which we call imagining. Imagining takes a sequence of events and simulates them by drawing diagrams. In order to do this imagining, we have developed two representations of objects, one involving histories and the other involving diagrams, and two corresponding representations of physical processes, each suited to reasoning about one of the object representations. These representations facilitate both spatial and temporal reasoning.
Author[s]: Gerald Roylance
A Simple Model of Circuit Design
A simple analog circuit designer has been implemented as a rule based system. The system can design voltage followers. Miller integrators, and bootstrap ramp generators from functional descriptions of what these circuits do. While the designer works in a simple domain where all components are ideal, it demonstrates the abilities of skilled designers. While the domain is electronics, the design ideas are useful in many other engineering domains, such as mechanical engineering, chemical engineering, and numerical programming. Most circuit design systems are given the circuit schematic and use arithmetic constraints to select component values. This circuit designer is different because it designs the schematic. The designer uses a unidirectional CONTROL relation to find the schematic. The circuit designs are built around this relation; it restricts the search space, assigns purposes to components and finds design bugs.
Author[s]: Daniel Carl Brotsky
An Algorithm for Parsing Flow Graphs
This report describes research about flow graphs - labeled, directed, acyclic graphs which abstract representations used in a variety of Artificial Intelligence applications. Flow graphs may be derived from flow grammars much as strings may be derived from string grammars; this derivation process forms a useful model for the stepwise refinement processes used in programming and other engineering domains. The central result of this report is a parsing algorithm for flow graphs. Given a flow grammar and a flow graph, the algorithm determines whether the grammar generates the graph and, if so, finds all possible derivations for it. The author has implemented the algorithm in LISP. The intent of this report is to make flow-graph parsing available as an analytic tool for researchers in Artificial Intelligence. The report explores the intuitions behind the parsing algorithm, contains numerous, extensive examples of its behavior, and provides some guidance for those who wish to customize the algorithm to their own uses.
Author[s]: Peter C. Gaston and Tomaso Lozano-Perez
Tactile Recognition and Localization Using Object Models: The Case of Polyhedra on a Plane
This paper discusses how data from multiple tactile sensors may be used to identify and locate one object, from among a set of known objects. We use only local information from sensors: (1) the position of contact points, and (2) ranges of surface normals at the contact points. The recognition and localization process is structured as the development and pruning of a tree of consistent hypotheses about pairings between contact points and object surfaces. In this paper, we deal with polyhedral objects constrained to lie on a known plane, i.e., having three degrees of positioning freedom relative to the sensors.
Author[s]: Shimon Ullman
Computational Studies in the Interpretation of Structure and Motion: Summary and Extension
Computational studies of the interpretation of structure from motion examine the conditions under which three-dimensional structure can be recovered from motion in the image. The first part of this paper summarizes the main results obtained to date in these studies. The second part examines two issues: the robustness of the 3-D interpretation of perspective velocity fields, and the 3-D information contained in orthographic velocity fields. The two are related because, under local analysis, limitations on the interpretation of orthographic velocity fields also apply to perspective projection. The following results are established: When the interpretation is applied locally, the 3-D interpretation of the perspective velocity field is unstable. The orthographic velocity field determines the structure of the inducing object exactly up to a depth-scaling. For planar objects, the orthographic velocity field always admits two distinct solutions up to depth-scaling. The 3-D structure is determined uniquely by a “view and a half” of the orthographic velocity field.
Author[s]: Walter Hamscher
Using Structural and Functional Information in Diagnostic Design
We wish to design a diagnostic for a device from knowledge of its structure and function. the diagnostic should achieve both coverage of the faults that can occur in the device, and should strive to achieve specificity in its diagnosis when it detects a fault. A system is described that uses a simple model of hardware structure and function, representing the device in terms of its internal primitive functions and connections. The system designs a diagnostic in three steps. First, an extension of path sensitization is used to design a test for each of the connections in teh device. Next, the resulting tests are improved by increasing their specificity. Finally the tests are ordered so that each relies on the fewest possible connections. We describe an implementation of this system and show examples of the results for some simple devices.
Author[s]: David Allen McAllester
Solving Uninterpreted Equations with Context Free Expression Grammars
It is shown here that the equivalence class of an expression under the congruence closure of any finite set of equations between ground terms is a context free expression language. An expression is either a symbols or an n- tuple of expressions; the difference between expressions and strings is that expressions have inherent phrase structure. The Downey, Sethi and Tarjan algorithm for computing congruence closures can be used to convert finite set of equations E to a context free expression grammar G such that for any expression u the equivalence class of u under E is precisely the language generated by an expression form I’(u) under grammar G. the fact that context free expression languages are closed under intersection is used to derive an algorithm for computing a grammar for the equivalence class of a given expression under any finite disjunction of finite sets of equations between ground expressions. This algorithm can also be used to derive a grammar representing the equivalence class of conditional expressions of the form if P then u else v. The description of an equivalence class by a context free expression grammar can also be used to simplify expressions under “well behaved” simplicity orders. Specifically if G is a context free expression grammar which generates an equivalence class of expressions then for any well behaved simplicity order there is a subset G’ of the productions G such that the expressions generated by G’ are exactly those expressions of the equivalence class which are simplicity bounds and whose subterms are also simplicity bounds. Furthermore G’ can be computed from G in order nlog(n) time plus the time required to do order nlog(n) comparisons between expressions where n is the size G.
Author[s]: David Allen McAllester
Symmetric Set Theory: A General Theory of Isomorphism, Abstraction, and Representation
It is possible to represent a finite set of points (atoms) by a finite sequence of points. However a finite set of points has no distinguished member and therefore it is impossible to define a function which takes a finite set of points and returns a “first” point in that set. Thus it is impossible to represent a finite sequence of points by a finite set of points. The theory of symmetric sets provides a framework in which the observation about sets and sequences can be proven. The theory of symmetric sets is similar to classical (Zermello-Fraenkel) set theory with the exception that the universe of symmetric sets includes points (ur-elements). Points provide a basis for general notions of isomorphism and symmetry. The general notions of isomorphism and symmetry in turn provide a basis for natural, simple, and universal definitions of abstractness, essential properties and functions, canonicality, and representations. It is expected that these notions will play an important role in the theory of data structures and in the construction of general techniques for reasoning about data structures.
Author[s]: Michael Brady and Alan Yuille
An Extremum Principle for Shape from Contour
An extremum principle is developed that determines three-dimensional surface orientation from a two-dimensional contour. The principle maximizes the ratio of the area to the square of the perimeter, a measure of the compactness or symmetry of the three- dimensional surface. The principle interprets regular figures correctly and it interprets skew symmetries as oriented real symmetries. The maximum likelihood method approximates the principle on irregular figures, but we show that it consistently overestimates the slant of an ellipse.
Author[s]: C. Koch and T. Poggio
Information Processing in Dendritic Spines
Dendritic spines are small twigs on the dendrites of a very large class of neurons in the central nervous system. There are between 10 (3) and 10 (5) spines per neuron, each one including at least one synapse, i.e. a connection with other neurons. Thus, spines are usually associated with an important feature of neurons – their high degree of connectivity – one of the most obvious differences between present computers and brains. We have analysed the electrical properties of a cortical (spiny) pyramidal cell on the basis of passive cable theory, from measurements made on histological material, using the solution of the cable equation for an arbitrary branched dendritic tree. As postulated by Rall, we found that the somatic potential induced by firing synapse on a spine is a very sensitive function of the dimension of the spine. This observation leads to several hypotheses concerning the electrical functions of spines, especially with respect to their role in memory.
Author[s]: C. Koch and T. Poggio
A Theoretical Analysis of Electrical Properties of Spines
The electrical properties of a cortical (spiny) pyramidal cell were analyzed on the basis of passive cable theory from measurements made on histological material (Koch, Poggio & Torre 1982). The basis of this analysis is the solution o the cable equation for an arbitrary branched dendritic tree. We determined the potential at the soma as a function of the synaptic input (transient conductance changes) and as a function of the spine neck dimensions. From our investigation four major points emerge: 1. Spine may effectively compress the effect of each single excitatory synapse on the soma, mapping a wide range of inputs onto a limited range of outputs (nonlinear saturation). This is also true for very fast transient inputs, in sharp contrast with the case of a synapse on a dendrite. 2. The somatic depolarization due to an excitatory synapse on a spine is a very sensitive function of the spine neck length and diameter. Thus the spine can effectively control the resulting saturation curve. This might be the basic mechanism underlying ultra-short memory, long-term potentiation in the hippocampus or learning in the cerebellum. 3. Spines with shunting inhibitory synapses on them are ineffective in reducing the somatic depolarization due to excitatory inputs on the dendritic shaft or on other spines. Thus isolated inhibitory synapses on a spine are not expected to occur. 4. The conjunction of an excitatory synapse with a shunting inhibitory synapse on the same spine may result in a time-discrimination circuit with a temporal resolution of around 100usec.
Author[s]: Ikeuchi, Katsushi
Determining Attitude of Object from Needle Map Using Extended Gaussian Image
An extended Gaussian image (EGI) is constructed by mapping the surface normals of an object onto the Gussian sphere. The attitude of an object is greatly constrained by the global distribution of EGI mass over the visible Gaussian hemisphere. Constraints on the viewer direction are derived from the position of the EGI mass center, and from the direction of the EGI inertia axis. The algorithm embodying these constraints and the EGI mass distribution are implemented using a lookup table. A function for matching an observed EGI with the prototypical EGIs is also proposed. The algorithm determines the attitude of an object successfully both from a synthesized needle map and a real needle map.
Author[s]: George Edward Barton, Jr.
A Multiple-Context Equality-Based Reasoning System
Expert systems are too slow. This work attacks that problem by speeding up a useful system component that remembers facts and tracks down simple consequences. The redesigned component can assimilate new facts more quickly because it uses a compact, grammar-based internal representation to deal with whole classes of equivalent expressions at once. It can support faster hypothetical reasoning because it remembers the consequences of several assumption sets at once. The new design is targeted for situations in which many of the stored facts are equalities. The deductive machinery considered here supplements stored premises with simple new conclusions. The stored premises include permanently asserted facts and temporarily adopted assumptions. The new conclusions are derived by substituting equals for equals and using the properties of the logical connectives AND, Or, and NOT. The deductive system provides supporting premises for its derived conclusions. Reasoning that involves quantifiers is beyond the scope of its limited and automatic operation. The expert system of which the reasoning system is a component is expected to be responsible for overall control of reasoning.
Author[s]: Graziella Tonfoni and Richard J. Doyle
Understanding Text through Summarization and Analogy
Understanding a text exactly in the way that the Text Producer meant the text to be understood is highly unlikely unless the text interpretation process is constrained. Specific understanding-directing criteria are given in the form of a Premise which is a configuration of plot-units. After performing a Premise- directed text summarization, the Text Receiver will have understood the text as the Text Producer intended and will then be able to replace missing relations within the exercises and produce new texts by applying analogy.
Author[s]: John M. Hollerbach and Gideon Sahar
Wrist-Partitioned Inverse Kinematic Accelerations and Manipulator Dynamics
An efficient algorithm is presented for the calculation of the inverse kinematic accelerations for a 6 degree-of-freedom manipulator with a spherical wrist. The inverse kinematic calculation is shown to work synergistically with the inverse dynamic calculation, producing kinematic parameters needed in the recursive Newton-Euler dynamics formulation. Additional savings in the dynamics computation are noted for a class of kinematically well-structured manipulators such as spherical-wrist arms and for manipulators with simply-structured inertial parameters.
Author[s]: A. Yuille
Zero-Crossings on Lines of Curvature
We investigate the relations between the structure of the image and events in the geometry of the underlying surface. We introduce some elementary differential geometry and use it to define a coordinate system on the object based on the lines of curvature. Using this coordinate system we can prove results connecting the extrema, ridges and zero-crossings in the image to geometrical features of the object. We show that extrema of the image typically correspond to points on the surface with zero Gaussian curvature and that parabolic lines often give rise to ridges, or valleys, in the image intensity. We show that directional zero- crossings of the image along the lines of curvature generally correspond to extrema of curvature along such lines.
Author[s]: Gerald Barber, Peter de Jong and Carl Hewitt
Semantic Support for Work in Organizations
Present day computer systems cannot implement much of the work carried out in organizations such as: planning, decision making, analysis, and dealing with unanticipated situations. Such organizational activities have traditionally been considered too unstructured to be suitable for automation by computer. We are working on the development of computer technology to overcome these limitations. Our goal is the development of a computer system which is capable of the following: describing the semantics of applications as well as the structure of the organization carrying out the work, aiding workers in carrying out the applications using these descriptions, and acquiring these capabilities in the course of the daily work through a process which is analogous to apprenticeship.
Author[s]: John Francis Canny
Finding Edges and Lines in Images
The problem of detecting intensity changes in images is canonical in vision. Edge detection operators are typically designed to optimally estimate first or second derivative over some (usually small) support. Other criteria such as output signal to noise ratio or bandwidth have also been argued for. This thesis is an attempt to formulate a set of edge detection criteria that capture as directly as possible the desirable properties of an edge operator. Variational techniques are used to find a solution over the space of all linear shift invariant operators. The first criterion is that the detector have low probability of error i.e. failing to mark edges or falsely marking non- edges. The second is that the marked points should be as close as possible to the centre of the true edge. The third criterion is that there should be low probability of more than one response to a single edge. The technique is used to find optimal operators for step edges and for extended impulse profiles (ridges or valleys in two dimensions). The extension of the one dimensional operators to two dimentions is then discussed. The result is a set of operators of varying width, length and orientation. The problem of combining these outputs into a single description is discussed, and a set of heuristics for the integration are given.
Author[s]: Shimon Ullman
Maximizing Rigidity: The Incremental Recovery of 3-D Structure from Rigid and Rubbery Motion
The human visual system can extract 3-D shape information of unfamiliar moving objects from their projected transformations. Computational studies of this capacity have established that 3-D shape, can be extracted correctly from a brief presentation, provided that the moving objects are rigid. The human visual system requires a longer temporal extension, but it can cope, however, with considerable deviations from rigidity. It is shown how the 3-D structure of rigid and non- rigid objects can be recovered by maintaining an internal model of the viewed object and modifying it at each instant by the minimal non-rigid change that is sufficient to account for the observed transformation. The results of applying this incremental rigidity scheme to rigid and non-rigid objects in motion are described and compared with human perceptions.
Author[s]: A.L. Yuille and T. Poggio
Scaling Theorems for Zero-Crossings
We characterize some properties of the zero-crossings of the laplacian of signals - in particular images - filtered with linear filters, as a function of the scale of the filter (following recent work by A. Witkin, 1983). We prove that in any dimension the only filter that does not create zero crossings as the scale increases is gaussian. This result can be generalized to apply to level-crossings of any linear differential operator: it applies in particular to ridges and ravines in the image density. In the case of the second derivative along the gradient we prove that there is no filter that avoids creation of zero-crossings.
Author[s]: Shimon Ullman
This paper examines the processing of visual information beyond the creation of the early representations. A fundamental requirement at this level is the capacity to establish visually abstract shape properties and spatial relations. This capacity plays a major role in object recognition, visually guided manipulation, and more abstract visual thinking. For the human visual system, the perception of spatial properties and relations that are complex from a computational standpoint, nevertheless often appears immediate and effortless. This apparent immediateness and ease of perceiving spatial relations is, however, deceiving. It conceals in fact a complex array of processes highly specialized for the task. The proficiency of the human system in analyzing spatial information far surpasses the capacities of current artificial systems. The study of the computations that underlie this competence may therefore lead to the development of new more efficient processors for the spatial analysis of visual information. It is suggested that the perception of spatial relations is achieved by the application to the base representations of visual routines that are composed of sequences of elemental operations. Routines for different properties and relations share elemental operations. Using a fixed set of basic operations, the visual system can assemble different routines to extract an unbounded variety of shape properties and spatial relations. At a more detailed level, a number of plausible basic operations are suggested, based primarily on their potential usefulness, and supported in part by empirical evidence. The operations discussed include shifting of the processing focus, indexing to an odd-man-out location, bounded activation, boundary tracing, and marking. The problem of assembling such elemental operations into meaningful visual routines is discussed briefly.
Author[s]: A.L. Yuille
The Smoothest Velocity Field and Token Matching
This paper presents some mathematical results concerning the measurement of motion of contours. A fundamental problem of motion measurement in general is that the velocity field is not determined uniquely from the changing intensity patterns. Recently Hildreth & Ullman have studied a solution to this problem based on an Extremum Principle [Hildreth (1983), Ullman & Hildreth (1983)]. That is, they formulate the measurement of motion as the computation of the smoothest velocity field consistent with the changing contour. We analyse this Extremum principle and prove that it is closely related to a matching scheme for motion measurement which matches points on the moving contour that have similar tangent vectors. We then derive necessary and sufficient conditions for the principle to yield the correct velocity field. These results have possible implications for the design of computer vision systems, and for the study of human vision.
Author[s]: Rodney A. Brooks
Planning Collision Free Motions for Pick and Place Operations
An efficient algorithm which finds collision free paths for a manipulator with 5 or 6 revolute joints is described. It solves the problem for four degree of freedom pick and place operations. Examples are given of paths found by the algorithm in tightly cluttered workspaces. The algorithm first describes free space in two ways: as freeways for the hand and payload ensemble and as freeways for the upperarm. Freeways match volumes swept out by manipulator motions and can be “inverted” to find a class of topologically equivalent path segments. The two freeway spaces are searched concurrently under projection of constraints determined by motion of the forearm.
Author[s]: Katsushi Ikeuchi, Berthold K.P. Horn, Shigemi Nagata, Tom Callahan and Oded Fein
Picking Up an Object from a Pile of Objects
This paper describes a hand-eye system we developed to perform the binpicking task. Two basic tools are employed: the photometric stereo method and the extended Gaussian image. The photometric stereo method generates the surface normal distribution of a scene. The extended Gaussian image allows us to determine the attitude of the object based on the normal distribution. Visual analysis of an image consists of two stages. The first stage segments the image into regions and determines the target region. The photometric stereo system provides the surface normal distribution of the scene. The system segments the scene into isolated regions using the surface normal distribution rather than the brightness distribution. The second stage determines object attitude and position by comparing the surface normal distribution with the extended-Gaussian- image. Fingers, with LED sensor, mounted on the PUMA arm can successfully pick an object from a pile based on the information from the vision part.
Author[s]: Carl Hewitt and Peter de Jong
Analyzing the Roles of Descriptions and Actions in Open Systems
This paper analyzes relationships between the roles of descriptions and actions in large scale, open ended, geographically distributed, concurrent systems. Rather than attempt to deal with the complexities and ambiguities of currently implemented descriptive languages, we concentrate our analysis on what can be expressed in the underlying frameworks such as the lambda calculus and first order logic. By this means we conclude that descriptions and actions complement one another: neither being sufficient unto itself. This paper provides a basis to begin the analysis of the very subtle relationships that hold between descriptions and actions in Open Systems.
Author[s]: Daniel G. Theriault
Issues in the Design and Implementation of Act 2
Act2 is a highly concurrent programming language designed to exploit the processing power available from parallel computer architectures. The language supports advanced concepts in software engineering, providing high-level constructs suitable for implementing artificially-intelligent applications. Act2 is based on the Actor model of computation, consisting of virtual computational agents which communicate by message-passing. Act2 serves as a framework in which to integrate an actor language, a description and reasoning system, and a problem-solving and resource management system. This document describes issues in Act2’s design and the implementation of an interpreter for the language.
Author[s]: A.L. Yuille and T. Poggio
Fingerprints Theorems for Zero-Crossings
We prove that the scale map of the zero- crossings of almost all signals filtered by the second derivative of a gaussian of variable size determines the signal uniquely, up to a constant scaling and a harmonic function. Our proof provides a method for reconstructing almost all signals from knowledge of how the zero-crossing contours of the signal, filtered by a gaussian filter, change with the size of the filter. The proof assumes that the filtered signal can be represented as a polynomial of finite, albeit possibly very high, order. An argument suggests that this restriction is not essential. Stability of the reconstruction scheme is briefly discussed. The result applies to zero- and level-crossings of linear differential operators of gaussian filters. The theorem is extended to two dimensions, that is to images. These results are reminiscent of Logan’s theorem. They imply that extrema of derivatives at different scales are a complete representation of a signal.
Author[s]: Whitman Richards
Structure from Stereo and Motion
Stereopsis and motion parallax are two methods for recovering three dimensional shape. Theoretical analyses of each method show that neither alone can recover rigid 3D shapes correctly unless other information, such as perspective, is included. The solutions for recovering rigid structure from motion have a reflection ambiguity; the depth scale of the stereoscopic solution will not be known unless the fixation distance is specified in units of interpupil separation. (Hence the configuration will appear distorted.) However, the correct configuration and the disposition of a rigid 3D shape can be recovered if stereopsis and motion are integrated, for then a unique solution follows from a set of linear equations. The correct interpretation requires only three points and two stereo views.
Author[s]: D.D. Hoffman and Whitman Richards
Parts of Recognition
A complete theory of object recognition is an impossibility – not simply because of the multiplicity of visual cues we exploit in elegant coordination to identify an object, but primarily because recognition involves fixation of belief, and anything one knows may be relevant. We finesse this obstacle with two moves. The first restricts attention to one visual cue, the shapes of objects; the second restricts attention to one problem, the initial guess at the identity of an object. We propose that the visual system decomposes a shape into parts, that it does so using a rule defining part boundaries rather than part shapes, that the rule exploits a uniformity of nature – transversality, and that parts with their descriptions and spatial relations provide a first index into a memory of shapes. These rules lead to a more comprehensive explanation of several visual illusions. The role of inductive inference is stressed in our theory. We conclude with a précis of unsolved problems.
Author[s]: Ellen C. Hildreth
The Computation of the Velocity Field
The organization of movement in the changing retinal image provides a valuable source of information for analyzing the environment in terms of objects, their motion in space and their three-dimensional structure. A description of this movement is not provided to our visual system directly, however; it must be inferred from the pattern of changing intensity that reaches the eye. This paper examines the problem of motion measurement, which we formulate as the computation of an instantaneous two- dimensional velocity field from the changing image. Initial measurements of motion take place at the location of significant intensity changes, as suggested by Marr and Ullman (1981). These measurements provide only one component of local velocity, and must be integrated to compute the two-dimensional velocity field. A fundamental problem for this integration stage is that the velocity field is not determined uniquely from information available in the changing image. We formulate an additional constraint of smoothness of the velocity field, based on the physical assumption that surfaces are generally smooth, which allows the computation of a unique velocity field. A theoretical analysis of the conditions under which this computation yields the correct velocity field suggests that the solution is physically plausible. Empirical studies show the predictions of this computation to be consistent with human motion perception.
Author[s]: Harold Abelson and Gerald Jay Sussman
Structure and Interpretation of Computer Programs
“The Structure and Interpretation of Computer Programs” is the entry-level subject in Computer Science at the Massachusetts Institute of Technology. It is required of all students at MIT who major in Electrical Engineering or in Computer Science, as one fourth of the “common core curriculum,” which also includes two subjects on circuits and linear systems and a subject on the design of digital systems. We have been involved in the development of this subject since 1978, and we have taught this material in its present form since the fall of 1980 to approximately 600 students each year. Most of these students have had little or no prior formal training in computation, although most have played with computers a bit and a few have had extensive programming or hardware design experience. Our design of this introductory Computer Science subject reflects two major concerns. First we want to establish the idea that a computer language is not just a way of getting a computer to perform operations, but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute. Secondly, we believe that the essential material to be addressed by a subject at this level, is not the syntax of particular programming language constructs, nor clever algorithms for computing particular functions of efficiently, not even the mathematical analysis of algorithms and the foundations of computing, but rather the techniques used to control the intellectual complexity of large software systems.
Author[s]: Bruce R. Donald
Hypothesizing Channels through Free-Space in Solving the Findpath Problem
Given a polyhedral environment, a technique is presented for hypothesizing a channel volume through the free space containing a class of successful collision-free paths. A set of geometric constructions between obstacle faces is proposed, and we define a mapping from a field of view analysis to a direct local construction of free space. The algorithm has the control structure of a search which propagates construction of a connected channel towards a goal along a frontier of exterior free faces. Thus a channel volume starts out by surrounding the moving object in the initial configuration and “grows” towards the goal. Finally, we show techniques for analyzing the channel decomposition of free space and suggesting a path.
Author[s]: H.K. Nishihara and T. Poggio
Hidden Clues in Random Line Stereograms
Successful fusion of random-line stereograms with breaks in the vernier acuity range has been previously interpreted to suggest that the interpolation process underlying hyperacuity is parallel and preliminary to stereomatching. In this paper (a) we demonstrate with computer experiments that vernier cues are not needed to solve the stereomatching problem posed by these stereograms and (b) we provide psychophysical evidence that human stereopsis probably does not use vernier cues alone to achieve fusion of these random-line stereograms.
Author[s]: W. Eric L. Grimson and Tomas Lozano-Perez
Model-Based Recognition and Localization from Sparse Range or Tactile Data
This paper discusses how local measurements of three-dimensional positions and surface normals (recorded by a set of tactile sensors, or by three-dimensional range sensors), may be used to identify and locate objects, from among a set of known objects. The objects are modeled as polyhedra having up to six degrees of freedom relative to the sensors. We show that inconsistent hypotheses about pairings between sensed points and object surfaces can be discarded efficiently by using local constraints on: distances between faces, angles between face normals, and angles (relative to the surface normals) of vectors between sensed points. We show by simulation and by mathematical bounds that the number of hypotheses consistent with these constraints is small. We also show how to recover the position and orientation of the object from the sense data. The algorithm’s performance on data obtained from a triangulation range sensor is illustrated.
Author[s]: Davis, Randall
Diagnostic Reasoning Based on Structure and Behavior
We describe a system that reasons from first principles, i.e., using knowledge of structure and behavior. The system has been implemented and tested on several examples in the domain of troubleshooting digital electronic circuits. We give an example of the system in operation, illustrating that this approach provides several advantages, including a significant degree of device independence, the ability to constrain the hypotheses it considers at the outset, yet deal with a progressively wider range of problems, and the ability to deal with situations that are novel in the sense that their outward manifestations may not have been encountered previously. As background we review our basic approach to describing structure and behavior, then explore some of the technologies used previously in troubleshooting. Difficulties encountered there lead us to a number of new contributions, four of which make up the central focus of this paper. We describe a technique we call constraint suspension that provides a powerful tool for troubleshooting. We point out the importance of making explicit the assumptions underlying reasoning and describe a technique that helps enumerate assumptions methodically. The result is an overall strategy for troubleshooting based on the progressive relaxation of underlying assumptions. The system can focus its efforts initially, yet will methodically expand its focus to include a broad range of faults. Finally, abstracting from our examples, we find that the concept of adjacency proves to be useful in understanding why some faults are especially difficult and why multiple different representations are useful.
Author[s]: Berthod K.P. Horn
Extended Gaussian Images
This is a primer on extended Gaussian Images. Extended Gaussian Images are useful for representing the shapes of surfaces. They can be computed easily from: 1. Needle maps obtained using photometric stereo, or 2. Depth maps generated by ranging devices or stereo. Importantly, they can also be determined simply from geometric models of the objects. Extended Gaussian images can be of use in at least two of the tasks facing a machine vision system. 1. Recognition, and 2. Determining the attitude in space of an object. Here, the extended Gaussian image is defined and some of its properties discussed. An elaboration for non-convex objects is presented and several examples are shown.
Author[s]: K.R.K. Nielsen and T. Poggio
Vertical Image Registration in Stereopsis
Most computational theories of stereopsis require a registration stage prior to stereo matching to reduce the matching to a one- dimensional search. Even after registration, it is critical that the stereo matching process tolerate some degree of residual misalignment. In this paper, we study with psychophysical techniques the tolerance to vertical disparity in situations in which false targets abound – as in random dot stereograms – and eye movements are eliminated. Our results show that small amounts of vertical disparity significantly impair depth discrimination in a forced-choice task. Our main results are: a) vertical disparity of only the central “figure” part of a random dot stereogram can be tolerated up to about 3.5’, b) vertical disparity of the “figure + ground” is tolerated up to about 6.5’, and c) the performance of the Grimson implementation of the Marr-Poggio stereo matching algorithm for the stereograms of experiment (a) is consistent with the psychophysical results. The algorithm’s tolerance to vertical disparity is due exclusively to the spatial averaging of the underlying filters. The algorithm cannot account by itself for the results of experiment (b). Eye movements, which are the principal registration mechanism for human stereopsis, are accurate to within about 7’. Our data suggest that tolerance to this residual vertical disparity is attained by two non-motor mechanisms: 1) the spatial average performed by the receptive fields that filter the two images prior to stereo matching, and 2) a non-motor shift mechanism that may be driven at least in part by monocular cues.
Author[s]: Katsushi Ikeuchi
Constructing a Depth Map from Images
This paper describes two methods for constructing a depth map from images. Each method has two stages. First, one or more needle maps are determined using a pair of images. This process employs either the Marr-Poggio-Grimson stereo and shape-from- shading, or, instead, photometric stereo. Secondly, a depth map is constructed from the needle map or needle maps computed by the first stage. Both methods make use of an iterative relaxation method to obtain the final depth map.
Author[s]: Berthold K.P. Horn and Katsushi Ikeuchi
Picking Parts out of a Bin
One of the remaining obstacles to the widespread application of industrial robots is their inability to deal with parts that are not precisely positioned. In the case of manual assembly, components are often presented in bins. Current automated systems, on the other hand, require separate feeders which present the parts with carefully controlled position and attitude. Here we show how results in machine vision provide techniques for automatically directing a mechanical manipulator to pick one object at a time out of a pile. The attitude of the object to be picked up is determined using a histogram of the orientations of visible surface patches. Surface orientation, in turn, is determined using photometric stereo applied to multiple images. These images are taken with the same camera but differing lighting. The resulting needle map, giving the orientations of surface patches, is used to create an orientation histogram which is a discrete approximation to the extended Gaussian image. This can be matched against a synthetic orientation histogram obtained from prototypical models of the objects to be manipulated. Such models may be obtained from computer aided design (CAD) databases. The method thus requires that the shape of the objects be described, but it is not restricted to particular types of objects.
Author[s]: Hormoz Mansour
A Structural Approach to Analogy
There are multiple sorts of reasoning by analogy between two domains; the one with which we are concerned is a type of contextual analogy. The purpose of this paper is to see whether two domains that look analogous would be analogous in all aspects and contexts. To perform this, we analyse the domain according to different particularities. For each particularity or context we continue the analysis and search for another one within the same domain. In this way we create a kind of structure for the different domains. This sort of analysis is represented by frames and frames which are nested within each other. This paper describes this concept and an implemented system “MULTI_ANALOG”, a limited example of knowledge-acquisition, problem solving, and automatic-acquisition based on this particular form of analogy namely structural analogy.
Author[s]: Reid Gordon Simmons
Representing and Reasoning About Change in Geologic Interpretation
Geologic interpretation is the task of inferring a sequence of events to explain how a given geologic region could have been formed. This report describes the design and implementation of one part of a geologic interpretation problem solver -- a system which uses a simulation technique called imagining to check the validity of a candidate sequence of events. Imagining uses a combination of qualitative and quantitative simulations to reason about the changes which occured to the geologic region. The spatial changes which occur are simulated by constructing a sequence of diagrams. The quantitative simulation needs numeric parameters which are determined by using the qualitative simulation to establish the cumulative changes to an object and by using a description of the current geologic region to make quantitative measurements. The diversity of reasoning skills used in imagining has necessitated the development of multiple representations, each specialized for a different task. Representations to facilitate doing temporal, spatial and numeric reasoning are described in detail. We have also found it useful to explicitly represent processes. Both the qualitative and quantitative simulations use a discrete 'layer cake' model of geologic processes, but each uses a separate representation, specialized to support the type of simulation. These multiple representations have enabled us to develop a powerful, yet modular, system for reasoning about change.
Author[s]: Carl Hewitt and Henry Lieberman
Design Issues in Parallel Architecture for Artificial Intelligence
Development of highly intelligent computers requires a conceptual foundation that will overcome the limitations of the von Neumann architecture. Architectures for such a foundation should meet the following design goals: * Address the fundamental organizational issues of large-scale parallelism and sharing in a fully integrated way. This means attention to organizational principles, as well as hardware and software. * Serve as an experimental apparatus for testing large-scale artificial intelligence systems. * Explore the feasibility of an architecture based on abstractions, which serve as natural computational primitives for parallel processing. Such abstractions should be logically independent of their software and hardware host implementations. In this paper we lay out some of the fundamental design issues in parallel architectures for Artificial Intelligence, delineate limitations of previous parallel architectures, and outline a new approach that we are pursuing.
Author[s]: Christof Koch, Jose Marroquin and Alan Yuille
Analog "Neuronal" Networks in Early Vision
Many problems in early vision can be formulated in terms of minimizing an energy or cost function. Examples are shape-from- shading, edge detection, motion analysis, structure from motion and surface interpolation (Poggio, Torre and Koch, 1985). It has been shown that all quadratic variational problems, an important subset of early vision tasks, can be “solved” by linear, analog electrical or chemical networks (Poggio and Koch, 1985). IN a variety of situateions the cost function is non-quadratic, however, for instance in the presence of discontinuities. The use of non-quadratic cost functions raises the question of designing efficient algorithms for computing the optimal solution. Recently, Hopfield and Tank (1985) have shown that networks of nonlinear analog “neurons” can be effective in computing the solution of optimization problems. In this paper, we show how these networks can be generalized to solve the non-convex energy functionals of early vision. We illustrate this approach by implementing a specific network solving the problem of reconstructing a smooth surface while preserving its discontinuities from sparsely sampled data (Geman and Geman, 1984; Marroquin 1984; Terzopoulos 1984). These results suggest a novel computational strategy for solving such problems for both biological and artificial vision systems.
Author[s]: A. Yuille
A Method for Computing Spectral Reflectance
Psychophysical experiments show that the perceived colour of an object is relatively independent of the spectrum of the incident illumination and depends only on the surface reflectance. We demonstrate a possible solution to this undetermined problem by expanding the illumination and surface reflectance in terms of a finite number of basis functions. This yields a number of nonlinear equations for each colour patch. We show that given a sufficient number of surface patches with the same illumination it is possible to solve these equations up to an overall scaling factor. Generalizations to the spatial dependent situation are discussed. We define a method for detecting material changes and illustrate a way of detecting the colour of a material at its boundaries and propagating it inwards.
Author[s]: Richard C. Waters
KBEmacs: A Step Toward the Programmer's Apprentice
The Knowledge-Based Editor in Emacs (KBEmacs) is the current demonstration system implemented as part of the Programmer’s Apprentice project. KBEmacs is capable of acting as a semi-expert assistant to a person who is writing a program – taking over some parts of the programming task. Using KBEmacs, it is possible to construct a program by issuing a series of high level commands. This series of commands can be as much as an order of magnitude shorter than the program is describes. KBEmacs is capable of operating on Ada and Lisp programs of realistic size and complexity. Although KBEmacs is neither fast enough nor robust enough to be considered a true prototype, both of these problems could be overcome if the system were to be reimplemented.
Author[s]: Richard D. Lathrop
Parallelism in Manipulator Dynamics
This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the swing phase) given the desired trajectory; i.e., the Inverse Dynamics problem. It investigates the high degree of parallelism inherent in the computations, and presents two “mathematically exact” formulations especially suited to high-speed, highly parallel implementations using special-purpose hardware or VLSI devices. In principle, the formulations should permit the calculations to run at a speed bounded only by I/O. The first presented is a parallel version of the recent linear Newton-Euler recursive algorithm. The time cost is also linear in the number of joints, but the real-time coefficients are reduced by almost two orders of magnitude. The second formulation reports a new parallel algorithm which shows that it is possible to improve upon the linear time dependency. The real time required to perform the calculations increases only as the [log2] of the number of joints. Either formulation is susceptible to a systolic pipelined architecture in which complete sets of joint torques emerge at successive intervals of four floating-point operations. Hardware requirements necessary to support the algorithm are considered and found not to be excessive, and a VLSI implementation architecture is suggested. We indicate possible applications to incorporating dynamical considerations into trajectory planning, e.g. it may be possible to build an on-line trajectory optimizer.
Author[s]: Douglas Hofstadter
The Copycat Project: An Experiment in Nondeterminism and Creative Analogies
A micro-world is described, in which many analogies involving strikingly different concepts and levels of subtlety can be made. The question “What differentiates the good ones from the bad ones?” is discussed, and then the problem of how to implement a computational model of the human ability to come up with such analogies (and to have a sense for their quality) is considered. A key part of the proposed system, now under development is its dependence on statistically emergent properties of stochastically interacting “codelets” (small pieces of ready- to-run code created by the system, and selected at random to run with probability proportional to heuristically assigned “urgencies”). Another key element is a network of linked concepts of varying levels of “semanticity”, in which activation spreads and indirectly controls the urgencies of new codelets. There is pressure in the system toward maximizing the degree of “semanticity” or “intensionality” of descriptions of structures, but many such pressures, often conflicting, must interact with one another, and compromises must be made. The shifting of (1) perceived oundaries inside structures, (2) descriptive concepts chosen to apply to structures, and (3) features perceived as “salient” or not, is called “slippage”. What can slip, and how are emergent consequences of the interaction of (1) the temporary (“cytoplasmic”) structures involved in the analogy with (2) the permanent (“Platonic”) concepts and links in the conceptual proximity network, or “slippability network”. The architecture of this system is postulated as a general architecture suitable for dealing not only with fluid analogies, but also with other types of abstract perception and categorization tasks, such as musical perception, scientific theorizing, Bongard problems and others.
Author[s]: Michael Brady
Artificial Intelligence and Robotics
Since Robotics is the field concerned with the connection of perception to action, Artificial Intelligence must have a central role in Robotics if the connection is to be intelligent. Artificial Intelligence addresses the crucial questions of: what knowledge is required in any aspect of thinking; how that knowledge should be represented; and how that knowledge should be used. Robotics challenges AI by forcing it to deal with real objects in the real world. Techniques and representations developed for purely cognitive problems, often in toy domains, do not necessarily extend to meet the challenge. Robots combine mechanical effectors, sensors, and computers. AI has made significant contributions to each component. We review AI contributions to perception and object oriented reasoning. Object-oriented reasoning includes reasoning about space, path-planning, uncertainty, and compliance. We conclude with three examples that illustrate the kinds of reasoning or problem solving abilities we would like to endow robots with and that we believe are worthy goals of both Robotics and Artificial Intelligence, being within reach of both.
Author[s]: Michael Brady and Haruo Asada
Smoothed Local Symmetries and Their Implementation
We introduce a novel representation of two-dimensional shape that we call smoothed local symmetries (SLS). Smoothed local symmetries represent both the bounding contour of a shape fragment and the region that it occupies. In this paper we develop the main features of the SLS representation and describe an implemented algorithm that computes it. The performance of the algorithm is illustrated for a set of tools. We conclude by sketching a method for determining the articulation of a shape into subshapes.
Author[s]: Haruo Asada and Michael Brady
The Curvature Primal Sketch
In this paper we introduce a novel representation of the significant changes in curvature along the bounding contour of planar shape. We call the representation the curvature primal sketch. We describe an implemented algorithm that computes the curvature primal sketch and illustrate its performance on a set of tool shapes. The curvature primal sketch derives its name from the close analogy to the primal sketch representation advocated by Marr for describing significant intensity changes. We define a set of primitive parameterized curvature discontinuities, and derive expressions for their convolutions with the first and second derivatives of a Gaussian. The convolved primitives, sorted according to the scale at which they are detected, provide us with a multi-scaled interpretation of the contour of a shape.
Author[s]: Tomas Lozano-Perez, Matthew T. Mason and Russell H. Taylor
Automatic Synthesis of Fine-Motion Strategies for Robots
The use of active compliance enables robots to carry out tasks in the presence of significant sensing and control errors. Compliant motions are quite difficult for humans to specify, however. Furthermore, robot programs are quite sensitive to details of geometry and to error characteristics and must, therefore, be constructed anew for each task. These factors motivate the need for automatic synthesis tools for robot programming, especially for compliant motion. This paper describes a formal approach to the synthesis of compliant motion strategies from geometric descriptions of assembly operations and explicit estimates of errors in sensing and control. A key aspect of the approach is that it provides correctness criteria for compliant motion strategies.
Author[s]: Van-Duc Nguyen
The Find-Path Problem in the Plane
This paper presents a fast heuristic algorithm for planning collision-free paths of a moving robot in a cluttered planar workspace. The algorithm is based on describing the free space between the obstacles as a network of linked cones. Cones capture the freeways and the bottle-necks between the obstacles. Links capture the connectivity of the free space. Paths are computed by intersecting the valid configuration volumes of the moving robot inside these cones and inside the regions described by the links.
Author[s]: Ellen C. Hildreth
Computations Underlying the Measurement of Visual Motion
The organization of movement in a changing image provides a valuable source of information for analyzing the environment in terms of objects, their motion in space, and their three-dimensional structure. This movement may be represented by a two- dimensional velocity field that assigns a direction and magnitude of velocity to elements in the image. This paper presents a method for computing the velocity field, with three main components. First, initial measurements of motion in the image take place at the location of significant changes, which give rise to zero-crossings in the output of the convolution of the image with a *** operator. The initial motion measurements provide the component of velocity in the direction perpendicular to the local orientation of the zero-crossing contours. Second, these initial measurements are integrated along contours to compute the two-dimensional velocity field. Third, an additional constraint of smoothness of the velocity field, based on the physical constraint that surfaces are generally smooth, allows the computation of a unique velocity field. The details of an algorithm are presented, with results of the algorithm applied to artificial and natural image sequences.
Author[s]: W. Eric L. Grimson
Computational Experiments with a Feature Based Stereo Algorithm
Computational models of the human stereo system can provide insight into general information processing constraints that apply to any stereo system, either artificial or biological. In 1977, Marr and Poggio proposed one such computational model, that was characterized as matching certain feature points in difference-of-Gaussian filtered images, and using the information obtained by matching coarser resolution of representations to restrict the search space for matching finer resolution representations. An implementation of the algorithm and its testing on a range of images was reported in 1980. Since then a number psychophysical experiments have suggested possible refinements to the model and modifications to the algorithm. As well, recent computational experiments applying the algorithm to a variety of natural images, especially aerial photographs, have led to a number of modifications. In this article, we present a version of the Marr-Poggio-Grimson algorithm that embodies these modifications and illustrate its performance on a series of natural images.
Author[s]: W. Eric L. Grimson
The Combinatorics of Local Constraints in Model-Based Recognition and Localization from Sparse Data
The problem of recognizing what objects are where in the workspace of a robot can be cast as one of searching for a consistent matching between sensory data elements and equivalent model elements. In principle, this search space is enormous and to control the potential combinatorial explosion, constraints between the data and model elements are needed. We derive a set of constraints for sparse sensory data that are applicable to a wide variety of sensors and examine their characteristics. We then use known bounds on the complexity of constraint satisfaction problems together with explicit estimates of the effectiveness of the constraints derived for the case of sparse, noisy three-dimensional sensory data to obtain general theoretical bounds on the number of interpretations expected to be consistent with the data. We show that these bounds are consistent with empirical results reported previously. The results are used to demonstrate the graceful degradation of the recognition technique with the presence of noise in the data, and to predict the number of data points needed in general to uniquely determine the object being sensed.
Author[s]: John M. Rubin and W.A. Richards
Color Vision: Representing Material Categories
We argue that one of the early goals of color vision is to distinguish one kind of material from another. Accordingly, we show that when a pair of image regions is such that one region has greater intensity at one wavelength than at another wavelength, and the second region has the opposite property, then the two regions are likely to have arisen from distinct materials in the scene. We call this material change circumstance the 'opposite slope sign condition.' With this criterion as a foundation, we construct a representation of spectral information that facilitates the recognition of material changes. Our theory has implications for both psychology and neurophysiology. In particular, Hering's notion of opponent colors and psychologically unique primaries, and Land's results in two-color projection can be interpreted as different aspects of the visual system's goal of categorizing materials. Also, the theory provides two basic interpretations of the function of double-opponent color cells described by neurophysiologists.
Author[s]: Brian C. Williams
Qualitative Analysis of MOS Circuits
With the push towards sub-micron technology, transistor models have become increasingly complex. The number of components in integrated circuits has forced designer’s efforts and skills towards higher levels of design. This has created a gap between design expertise and the performance demands increasingly imposed by the technology. To alleviate this problem, software tools must be developed that provide the designer with expert advice on circuit performance and design. This requires a theory that links the intuitions of an expert circuit analyst with the corresponding principles of formal theory (i.e. algebra, calculus, feedback analysis, network theory, and electrodynamics), and that makes each underlying assumption explicit.
Author[s]: V. Torre and T. Poggio
On Edge Detection
Edge detection is the process that attempts to characterize the intensity changes in the image in terms of the physical processes that have originated them. A critical, intermediate goal of edge detection is the detection and characterization of significant intensity changes. This paper discusses this part fo the edge detection problem. To characterize the types of intensity changes derivatives of different types, and possibly different scales, are needed. Thus we consider this part of edge detection as a problem in numerical differentiation. We show that numerical differentiation of images is an ill-posed problem in the sense of Hadamard. Differentiation needs to be regularized by a regularizing filtering operation before differentiation. This shows that his part of edge detection consists of two steps, a filtering step and differentiation step.
Author[s]: Whitman Richards and Donald D. Hoffman
Codon Constraints on Closed 2D Shapes
Codons are simple primitives for describing plane curves. They thus are primarily image- based descriptors. Yet they have the power to capture important information about the 3-D world, such as making part boundaries explicit. The codon description is highly redundant (useful for error-correction). This redundancy can be viewed as a constraint on the number of possible codon strings. For smooth closed strings that represent the bounding contour (silhouette) of many smooth 3D objects, the constraints are so strong that sequences containing 6 elements yield only 33 generic shapes as compared with a possible number of 15, 625 combinations.
Author[s]: Christof Koch and Shimon Ullman
Selecting One Among the Many: A Simple Network Implementing Shifts in Selective Visual Attention
This study addresses the question of how simple networks can account for a variety of phenomena associated with the shift of a specialized processing focus across the visual scene. We address in particular aspects of the dichotomy between the preattentive-paralel and the attentive-serial modes of visual perception and their hypothetical neuronal implementations. Specifically we propose the following: 1.) A number of elementary features, such as color, orientation, direction of movement, disparity ect. are represented in parallel in different topographical maps, called the early representation. 2.) There exists a selective mapping from this early representation into a more central representation, such that at any instant the central representation contains the properties of only a single location in the visual scene, the selected location. 3.) We discuss some selection rules that determine which location will be mapped into the central representation. The major rule, using the saliency or conspicuity of locations in the early representation, is implemented using a so- called Winner-Take-All network. A hierarchical pyramid-like architecture is proposed for this network. We suggest possible implementatinos in neuronal hardware, including a possible role for the extensive back-projection from the cortex to the LGN.
Author[s]: Ronald S. Fearing and John M. Hollerbach
Basic Solid Mechanics for Tactile Sensing
In order to stably grasp objects without using object models, tactile feedback from the fingers is sometimes necessary. This feedback can be used to adjust grasping forces to prevent a part form slipping from a hand. If the angle of force at the object finger contact can be determined, slip can be prevented by the proper adjustment of finger forces. Another important tactile sensing task is finding the edged and corners of an object, since they are usually feasible grasping locations. This paper describes how this information can be extracted from the finger- object contact using strain sensors beneath a compliant skin. For determining contact forces, strain measurements are easier to use than the surface deformation profile. The finger is modelled as an infinite linear elastic half plane to predict the measured strain for several contact types and forces. The number of sensors required is less than has been proposed for other tactile recognition tasks. A rough upper bound on sensor density requirements for a specific depth is presented that is bas3ed on the frequency response of the elastic medium. The effects of different sensor stiffness on sensor performance are discussed.
Author[s]: Katsushi Ikeuchi, Keith H. Nishihara, Berthold K.P. Horn, Patrick Sobalvarro and Shigemi Nagata
Determining Grasp Points Using Photometric Stereo and the PRISM Binocular Stereo System
This paper describes a system which locates and grasps doughnut shaped parts from a pile. The system uses photometric stereo and binocular stereo as vision input tools. Photometric stereo is used to make surface orientation measurements. With this information the camera field is segmented into isolated regions of continuous smooth surface. One of these regions is then selected as the target region. The attitude of the physical object associated with the target region is determined by histograming surface orientations over that region and comparing with stored histograms obtained from prototypical objects. Range information, not available from photometric stereo is obtained by the PRISM binocular stereo system. A collision-free grasp configuration and approach trajectory is computed and executed using the attitude, and range data.
Author[s]: Poggio Tomaso and Vincent Torre
Ill-Posed Problems and Regularization Analysis in Early Vision
One of the best definitions of early vision is that it is inverse optics --- a set of computational problems that both machines and biological organisms have to solve. While in classical optics the problem is to determine the images of physical objects, vision is confronted with the inverse problem of recovering three-dimensional shape from the light distribution in the image. Most processes of early vision such as stereomatching, computation of motion and the "structure from" processes can be regarded as solutions to inverse problems. This common characteristic of early vision can be formalized: most early vision problems are "ill- posed problems" in the sense of Hadamard. We will show that a mathematical theory developed for regularizing ill-posed problems leads in a natural way to the solution of the early vision problems in terms of variational principles of a certain class. This is a new theoretical framework for some of the variational solutions already obtained in the analysis of early vision processes. It also shows how several other problems in early vision can be approached and solved.
Author[s]: Gerald Roylance
Some Scientific Subroutines in LISP
Here's a LISP library of mathematical functions that calculate hyperbolic and inverse hyperbolic functions. Bessel functions, elliptic integrals, the gamma and beta functions, and the incomplete gamma and beta functions. There are probability density functions, cumulative distributions, and random number generators for the normal, Poisson, chi- square, Student's T. and Snedecor's F integration, root finding, and convergence. Code to factor numbers and to the Solovay- Strassen probabilistic prime test.
Author[s]: Tomaso Poggio
Vision by Man and Machine
The development of increasingly sophisticated and powerful computers in the last few decades has frequently stimulated comparisons between them and the human brain. Such comparisons will become more earnest as computers are applied more and more to tasks formerly associated with essentially human activities and capabilities. The expectation of a coming generation of “intelligent” computers and robots with sensory, motor and even “intellectual” skills comparable in quality to (and quantitatively surpassing) our own is becoming more widespread and is, I believe, leading to a new and potentially productive analytical science of “information processing”. In no field has this new approach been so precisely formulated and so thoroughly exemplified as in the field of vision. As the dominant sensory modality of man, vision is one of the major keys to our mastery of the environment, to our understanding and control of the objects which surround us. If we wish to created robots capable of performing complex manipulative tasks in a changing environment, we must surely endow them with (among other things) adequate visual powers. How can we set about designing such flexible and adaptive robots? In designing them, can we make use of our rapidly growing knowledge of the human brain, and if so, how at the same time, can our experiences in designing artificial vision systems help us to understand how the brain analyzes visual information?
Author[s]: A.L. Yuille and T. Poggio
A Generalized Ordering Constraint for Stereo Correspondence
The ordering constraint along epipolar lines is a powerful constraint that has been exploited by some recent stereomatching algorithms. We formulate a generalized ordering constraint, not restricted to epipolar lines. We prove several properties of the generalized ordering constraint and of the “forbidden zone”, the set of matches that would violate the constraint. We consider both the orthographic and the perspective projection case, the latter for a simplified but standard stereo geometry. The disparity gradient limit found in the human stereo system may be related to a form of the ordering constraint. To illustrate our analysis we outline a simple algorithm that exploits the generalized ordering constraint for matching contours of wireframe objects. We also show that the use of the generalized ordering constraint implies several other stereo matching constraints: a0 the ordering constraint along epipolar lines, b) figural continuity, c) Binford’s cross-product constraint, d) Mayhew and Frisby’s figural continuity constraint. We finally discuss ways of extending the algorithm to arbitrary 3-D objects.
Author[s]: Hugh Robinson and Christof Koch
An Information Storage Mechanism: Calcium and Spines
This proposal addresses some of the biophysical events possibly underlying fast activity-dependent changes in synaptic efficiency. Dendritic spines in the cortex have attracted increased attention over the last years as a possible locus of cellular plasticity given the large number of studies reporting a close correlation between presynaptic activity (or lack of thereof) and changes in spine shape. This is highlighted by recent reports, showing that the spine cytoplasm contains high levels of actin. Moreover, it has been demonstrated that a high level of intracellular free calcium Ca squared positive, is a prerequisite for various forms of synaptic potentiation. We propose a series of plausible steps, linking presynaptic electrical activity at dendritic spines with a short lasting change in spine geometry. Specifically, we conjecture that the spike-induced excitatory postsynaptic potential triggers an influx of Ca squared positive into the spine, where it will rapidly bind to intracellular calcium buffers such as calmodulin and calcineurin. However, for prolonged or intense presynaptic electrical activity, these buffers will saturate, the free Ca squared positive will then activate the actin/myosin network in the spine neck, reversibly shortening the length of the neck and increasing its diameter. This change in the geometry of the spine will lead to an increase in the synaptic efficiency of the synapse. We will discuss the implication of our proposal for the control of cellular plasticity and its relation to generalized attention and arousal.
Author[s]: H.K. Nishihara
PRISM: A Practical Real-Time Imaging Stereo Matcher
A binocular-stereo-matching algorithm for making rapid visual range measurements in noisy images is described. This technique is developed for application to problems in robotics where noise tolerance, reliability, and speed are predominant issues. A high speed pipelined convolver for preprocessing images and an unstructured light technique for improving signal quality are introduced to help enhance performance to meet the demands of this task domain. These optimizations, however, are not sufficient. A closer examination of the problems encountered suggests that broader interpretations of both the objective of binocular stereo and of the zero-crossing theory of Marr and Poggio and required. In this paper, we restrict ourselves to the problem of making a single primitive surface measurement. For example, to determine whether or not a specified volume of space is occupied, to measure the range to a surface at an indicated image location, or to determine the elevation gradient at that position. In this framework we make a subtle but important shift from the explicit use of zero-crossing contours (in band-pass filtered images) as the elements matched between left and right images, to use of the signs between zero-crossings. With this change, we obtain a simpler algorithm with a reduced sensitivity to noise and a more predictable behavior. The PRISM system incorporates this algorithm with the unstructured light technique and a high speed digital convolver. It has been used successfully by others as a sensor in a path planning system and a bin picking system.
Author[s]: Carl Hewitt, Tom Reinhardt, Gul Agha and Giuseppe Attardi
Linguistic Support of Receptionists for Shared Resources
This paper addressed linguistic issues that arise in providing support for shared resources in large scale concurrent systems. Our work is based on the Actor Model of computation which unifies the lambda calculus, the sequential stored-program and the object-oriented models of computation. We show how receptionist can be used to regulate the se of shared resources by scheduling their access and providing protection against unauthorized or accidental access. A shared financial account is an example of the kind of resource that needs a receptionist. Issues involved in the implementation of scheduling policies for shared resources are also addressed. The modularity problems involved in implementing servers which multiplex the use of physical devices illustrated how delegation aids in the implementation of parallel problem solving systems for communities of actors.
Author[s]: Tomaso Poggio and Christof Koch
An Analog Model of Computation for the Ill-Posed Problems of Early Vision
A large gap exists at present between computational theories of vision and their possible implementation in neural hardware. The model of computation provided by the digital computer is clearly unsatisfactory for the neurobiologist, given the increasing evidence that neurons are complex devices, very different from simple digital switches. It is especially difficult to imagine how networks of neurons may solve the equations involved in vision algorithms in a way similar to digital computers. In this paper, we suggest an analog model of computation in electrical or chemical networks for a large class of vision problems, that map more easily into biological plausible mechanisms. Poggio and Torre (1984) have recently recognized that early vision problems such as motion analysis (Horn and Schunck, 1981; Hildreth, 1984a,b), edge detection (Torre and Poggio, 1984), surface interpolation (Grimson, 1981; Terzopoulos 1984), shape-from-shading (Ikeuchi and Horn, 1981) and stereomatching can be characterized as mathematically ill- posed problems in the sense of Hadamard (1923). Ill-posed problems can be “solved”, according to regularization theories, by variational principles of a specific type. A natural way of implementing variational problems are electrical, chemical or neuronal networks. We present specific networks for solving several low-level vision problems, such as the computation of visual motion and edge detection.
Author[s]: Tamar Flash and Neville Hogan
The Coordination of Arm Movements: An Experimentally Confirmed Mathematical Model
This paper presents studies of the coordination f voluntary human arm movements. A mathematical model is formulated which is shown to predict both the qualitative features and the quantitative details observed experimentally in planar, multi-joint arm movements. Coordination is modelled mathematically by defining an objective function, a measure of performance for any possible movement. The unique trajectory which yields the best performance is determined using dynamic optimization theory. In the work presented here the objective function is the square of the magnitude of jerk (rate of change of acceleration) of the hand integrated over the entire movement. This is equivalent to assuming that a major goal of motor coordination is the production of the smoothest possible movement of the hand. The theoretical analysis is based solely on the kinematics of movement independent of the dynamics of the musculoskeletal system, and is successful only when formulated in terms of the motion of the hand in extracorporal space. The implications with respect to movement organization are discussed.
Author[s]: Christof Koch
A Theoretical Analysis of the Electrical Properties of a X-Cell in the Cat's LGN
Electron microscope studies of relay cells in the lateral geniculate nucleus of the CAT have shown that the retinal input of X-cells is associated with a special synaptic circuitry, termed the spine-triad complex. The retinal afferents make an asymmetrical synapse with both a dendritic appendage of the X-cell and a geniculate interneuron. The interneuron contacts in turn the same dendritic appendage with a symmetrical synaptic profile. The retinal input to geniculate Y-cells is predominately found on dendritic shafts without any triadic arrangement. We explore the integrative properties of X- and Y-cells resulting from this striking dichotomy in synaptic architecture. The basis of our analysis is the solution of the cable equation for a branched dendritic tree with a known somatic input resistance. Under the assumption that the geniculate interneuron mediates a shunting inhibition, activation of the interneuron reduces very efficiently the excitatory post-synaptic potential induced by the retinal afferent without affecting the electrical activity in the rest of the cell. Therefore, the spine-triad circuit implements the analogy of an AND-NOT gate, unique to the X-system. Functionally, this corresponds to a presynaptic, feed-forward type of inhibition of the optic tract terminal. Since Y-cells lack this structure, inhibition acts globally, reducing the general electrical activity of the cell. We propose that geniculate interneurons gate the flow of visual information into the X-system as a function of the behavioral state of the animal, enhancing the center-surround antagonism and possibly mediating reciprocal lateral inhibition, eye-movement related suppression and selective visual attention.
Author[s]: G. Edward Barton, Jr.
Toward a Principle-Based Parser
Parser design lags behind linguistic theory. While modern transformational grammar has largely abandoned complex, language- specific rule systems in favor of modular subsystems of principles and parameters, the rule systems that underlie existing natural- language parsers are still large, detailed, and complicated. The shift to modular theories in linguistics took place because of the scientific disadvantages of such rule systems. Those scientific ills translate into engineering maladies that make building natural- language systems difficult. The cure for these problems should be the same in parser design as it was in linguistic theory. The shift to modular theories of syntax should be replicated in parsing practice; a parser should base its actions on interacting modules of principles and parameters rather than a complex, monolithic rule system. If it can be successfully carried out, the shift will make it easier to build natural-language systems because it will shorten and simplify the language descriptions that are needed for parsing. It will also allow parser design to track new developments in linguistic theory.
Author[s]: Kenneth D. Forbus
Qualitative Process Theory
Objects move, collide, flow, bend, heat up, cool down, stretch, compress and boil. These and other things that cause changes in objects over time are intuitively characterized as processes. To understand common sense physical reasoning and make programs that interact with the physical world as well as people do we must understand qualitative reasoning about processes, when they will occur, their effects, and when they will stop. Qualitative Process theory defines a simple notion of physical process that appears useful as a language in which to write dynamical theories. Reasoning about processes also motivates a new qualitative representation for quantity in terms of inequalities, called quantity space. This report describes the basic concepts of Qualitative Process theory, several different kinds of reasoning that can be performed with them, and discusses its impact on other issues in common sense reasoning about the physical world, such as causal reasoning and measurement interpretation. Several extended examples illustrate the utility of the theory, including figuring out that a boiler can blow up, that an oscillator with friction will eventually stop, and how to say that you can pull with a string but not push with it. This report also describes GIZMO, an implemented computer program which uses Qualitative Process theory to make predictions and interpret simple measurements. The represnetations and algorithms used in GIZMO are described in detail, and illustrated using several examples.
Author[s]: Christopher G. Atkeson and John M. Hollerback
Kinematic Features of Unrestrained Arm Movements
Unrestrained human arm trajectories between point targets have been investigated using a three dimensional tracking apparatus, the Selspot system. Movements were executed between different points in a vertical plane under varying conditions of speed and hand-held load. In contrast to past results which emphasized the straightness of hand paths, movement regions were discovered in which the hand paths were curved. All movements, whether curved or straight, showed an invariant tangential velocity profile when normalized for speed and distance. The velocity profile invariance with speed and load is interpreted in terms of simplification of the underlying arm dynamics, extending the results of Hollerbach and Flash (1982).
Author[s]: Bruce R. Donald
Motion Planning with Six Degrees of Freedom
The motion planning problem is of central importance to the fields of robotics, spatial planning, and automated design. In robotics we are interested in the automatic synthesis of robot motions, given high-level specifications of tasks and geometric models of the robot and obstacles. The Mover’s problem is to find a continuous, collision-free path for a moving object through an environment containing obstacles. We present an implemented algorithm for the classical formulation of the three-dimensional Mover’s problem: given an arbitrary rigid polyhedral moving object P with three translational and three rotational degrees of freedom, find a continuous, collision-free path taking P from some initial configuration to a desired goal configuration. This thesis describes the first known implementation of a complete algorithm (at a given resolution) for the full six degree of freedom Movers’ problem. The algorithm transforms the six degree of freedom planning problem into a point navigation problem in a six-dimensional configuration space (called C-Space). The C-Space obstacles, which characterize the physically unachievable configurations, are directly represented by six-dimensional manifolds whose boundaries are five dimensional C- surfaces. By characterizing these surfaces and their intersections, collision-free paths may be found by the closure of three operators which (i) slide along 5-dimensional intersections of level C-Space obstacles; (ii) slide along 1- to 4-dimensional intersections of level C-surfaces; and (iii) jump between 6 dimensional obstacles. Implementing the point navigation operators requires solving fundamental representational and algorithmic questions: we will derive new structural properties of the C-Space constraints and shoe how to construct and represent C-Surfaces and their intersection manifolds. A definition and new theoretical results are presented for a six- dimensional C-Space extension of the generalized Voronoi diagram, called the C- Voronoi diagram, whose structure we relate to the C-surface intersection manifolds. The representations and algorithms we develop impact many geometric planning problems, and extend to Cartesian manipulators with six degrees of freedom.
Author[s]: Marroquin, J.L.
Surface Reconstruction Preserving Discontinuities
Well-known methods for solving the shape-from-shading problem require knowledge of the reflectance map. Here we show how the shape-from-shading problem can be solved when the reflectance map is not available, but is known to have a given form with some unknown parameters. This happens, for example, when the surface is known to be Lambertian, but the direction to the light source is not known. We give an iterative algorithm that alternately estimates the surface shape and the light source direction. Use of the unit normal in parameterizing the reflectance map, rather than the gradient or stereographic coordinates, simpliflies the analysis. Our approach also leads to an iterative scheme for computing shape from shading that adjusts the current estimates of the focal normals toward or away from the direction of the light source. The amount of adjustment is proportional to the current difference between the predicted and the observed brightness. We also develop generalizations to less constrained forms of reflectance maps.
Author[s]: Daniel Sabey Weld
Switching Between Discrete and Continuous Process Models to Predict Molecular Genetic Activity
Two kinds of process models have been used in programs that reason about change: Discrete and continuous models. We describe the design and implementation of a qualitative simulator, PEPTIDE, which uses both kinds of process models to predict the behavior of molecular energetic systems. The program uses a discrete process model to simulate both situations involving abrupt changes in quantities and the actions of small numbers of molecules. It uses a continuous process model to predict gradual changes in quantities. A novel technique, called aggregation, allows the simulator to switch between theses models through the recognition and summary of cycles. The flexibility of PEPTIDE’s aggregator allows the program to detect cycles within cycles and predict the behavior of complex situations.
Author[s]: Eugene C. Ciccarelli, IV
Presentation Based User Interface
A prototype presentation system base is described. It offers mechanisms, tools, and ready-made parts for building user interfaces. A general user interface model underlies the base, organized around the concept of a presentation: a visible text or graphic for conveying information. Te base and model emphasize domain independence and style independence, to apply to the widest possible range of interfaces. The primitive presentation system model treats the interface as a system of processes maintaining a semantic relation between an application data base and a presentation data base, the symbolic screen description containing presentations. A presenter continually updates the presentation data base from the application data base. The user manipulates presentations with a presentation editor. A recognizer translates the user’s presentation manipulation into application data base commands. The primitive presentation system can be extended to model more complex systems by attaching additional presentation systems. In order to illustrate the model’s generality and descriptive capabilities, extended model structures for several existing user interfaces are discussed. The base provides support for building the application and presentation data bases, linked together into a single, uniform network, including descriptions of classes of objects as we as the objects themselves. The base provides an initial presentation data base network graphics to continually display it, and editing functions. A variety of tools and mechanisms help create and control presenters and recognizers. To demonstrate the base’s utility, three interfaces to an operating system were constructed, embodying different styles: icons, menu, and graphical annotation.
Author[s]: Christof Koch and Tomaso Poggio
Biophysics of Computation: Neurons, Synapses and Membranes
Synapses, membranes and neurotransmitters play an important role in processing information in the nervous system. We do not know, however, what biophysical mechanisms are critical for neuronal computations, what elementary information processing operations they implement, and which sensory or motor computations they underlie. In this paper, we outline an approach to these problems. We will review a number of different biophysical mechanisms such as synaptic interactions between excitation and inhibition, dendritic spines, non-impulse generating membrane nonlinearities and transmitter-regulated voltage channels. For watch one, we discuss the information processing operations that may be implemented. All of these mechanisms act either within a few milliseconds, such as the action potential or synaptic transmission, or over several hundred milliseconds or even seconds, modulating some property of the circuit. In some cases we will suggest specific examples where a biophysical mechanism underlies a given computation. In particular, we will discuss the neuronal operations, and their implementation, underlying direction selectivity in the vertebrate retina.
Author[s]: Alan Bawden and Philip E. Agre
What a Parallel Programming Language Has to Let You Say
We have implemented in simulation a prototype language for the Connection Machine called CL1. CL1 is an extrapolation of serial machine programming language technology: in CL1 one programs the individual processors to perform local computations and talk to the communications network. We present details of the largest of out experiments with CL1, an interpreter for Scheme (a dialect of Lisp) that allows a large number of different Scheme programs to be run in parallel on the otherwise SIMD Connection Machine. Our aim was not to propose Scheme as a language for a Connection Machine programming, but to gain experience using CL1 to implement an interesting and familiar algorithm. Consideration of the difficulties we encountered led us to the conclusion that CL1 programs do not capture enough of the causal structure of the processes they describe. Starting from this observation, we have designed a successor language called CGL (for Connection Graph Language).
Author[s]: Hormoz Mansour
The Use of Censors for Nonmonotonic Reasoning and Analogy in Medical Desicion-Making
A patient rarely has a single, isolated disease. The situation is usually much more complex since the different parts of the human organism and metabolism interact with each other and follow several feedback patterns. These interactions and feedback patterns become more important with the addition of the external environment. When a disease is present, the first steps of the medical diagnosis should be to research and to determine whether another disease interacts with (“Censors”) or changed the significant symptoms, syndromes, or results of the laboratory tests of the first disease. Understanding of this interaction and the appropriate reasoning is based on a type of non-monotonic logic. We will try, within this paper, to see the effect of two diseases on each other. One important part of the effect of two diseases on each other is the entrancing effect of what we call “Censors.” In addition, causal reasoning, reasoning by analogy, and learning from precedents are important and necessary for a human-like expert in medicine. Some aspects of their application to thyroid diseases, with an implemented system, are considered in this paper.