Abstracts - 2006
Gestalt Features For Biological And Computational Vision
Stanley M. Bileschi
The goal of of the visual system, computer or biological, is to transform a visual input into meaningful semantic information. The three major factors controlling this computation are the architecture, representation, and learning algorithm. The architecture defines the structure of the process, whether completely feed-forward, whether objects are completely independent or rely upon shared components, etc. The representation defines how the information is represented mathematically. This can be either as simple as a list of gray values, or much more complicated. And finally the learning algorithm defines both how the algorithm parameters are set, and how meanings are gleaned from test data. This project involves the second part, the representation. Specifically, we will explore how we can use influences from the Gestalt psychologists in developing representations which are conducive to more accurate image understanding and better models of human biology. We will develop computational methods for discovering gestalt principles like good continuity, closed forms, parallelism and symmetry.
It has been shown that image representation can be critical to the success of an image under- standing system, be it involving object detection, face recognition, or otherwise. Furthermore, our lab has had success in modeling mammalian visual systems, and then using the model in a computer vision task. In fact, the standard model of Poggio and Reisenhuber  performs as well or better than other state of the art image representations on commonly used benchmark vision tasks, even though the the model was designed as a faithful reproduction of the biological computation, rather than as a competitor in the field of object detection. To build on this success, we look to other properties of human vision that are not captured by the standard model, and strive to model and include them, so as to possibly improve detection performance even further. Specifically, it is our aim to test and mimic those properties of vision outlined by the Gestalt psychologists.
There is a great corpus of work in which types of features (image representations) are best for specific objects, or specific object data sets [1, 2, 3]. In addition, there is also a great body of work into the human visual system's ability to detect the types of stimuli described by the Gestalt Psychologists . Some effort has been made to model these behaviors in computers [6, 7]. Specifically, continuity and symmetry detection have been focused upon by computer vision scientists. There is, however, little research into how well the explicit representations of these properties might aide in the detection of objects.
This project will involve at least two phases of study. In the first phase, we will investigate the types of Gestalt stimuli that are easily detected by humans, but difficult to represent by the current implementation of the Standard Model. This will be done by designing stimuli and performing a 2-way forced choice task, both un-timed and with rapid presentation and masking, so as to isolate the feed-forward circuits of the visual pathway. Meanwhile, algorithms will be designed so as to explicitly model the gestalt principals . These algorithms will be designed to use the same types of information manipulation as in the Standard Model, so as to maximize biological plausibility. The performance of these algorithms on the same test data will be compared to the human performance, so as to distinguish which features are more or less likely to exist in the biology. Finally, the new features will be incorporated into our model of discriminative object detection, so as to see which, if any, can aide in the detection of object classes.
The design of plausible Gestalt algorithms is not in an of itself difficult. What is difficult, however, is to show if the algorithm is similar to the computation taking place in any other system. Further tests will need to be designed to explore these types of issues.
This project has potential impact both in biological and computational vision, should the algorithms prove to either mimic human performance or improve detection scores.
Should the features do well to model human detection of gestalt stimuli, then it would be wise to design tests to check whether these types of circuits actually physically exist within the organism, or if the processing simply leads to similar outcomes. If the features should lead to better object detectors, then parameter and design sweeps would be necessary so as to optimize performance.
 Serre, T., M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman and T. Poggio. A Theory of Object Recog- nition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex, CBCL Paper #259/AI Memo #2005-036