CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Normalization of Response to Multiple Stimuli in Monkey Inferotemporal Cortex

Davide Zoccolan, David Cox & James DiCarlo

A major challenge of current theories of vision is to understand how the visual system performs object recognition in cluttered conditions, typical of natural visual scenes, where objects of interest do not appear in isolation but together with background objects. Object recognition in primates is thought to depend on neuronal activity in the inferotemporal cortex (IT)[1], which is the last stage of the ventral visual stream. In fact, neurons found in monkey IT fulfill two essential requirements for visual recognition: invariance and selectivity. They are selectively tuned to views of complex objects such as faces and their responses show significant invariance to stimulus transformations such as scale and position changes[2, 3]. Previous studies have shown that neurons found in monkey IT are selectively tuned to views of complex objects such as faces and their responses are tolerant to stimulus transformations such as scale and position changes. A few studies also reported some degree of tolerance of IT neurons to clutter[4, 5]. However, a systematic understanding of the relationship between clutter tolerance and shape-selectivity in IT still lacks and no models have been proposed to explain the response of IT neurons to multiple objects based on the responses to those same objects presented in isolation in their receptive field (RF).

In the present investigation, we explicitly addressed these issues, by recording IT neuronal responses in three monkey subjects under two complementary experimental paradigms. One monkey was tested using sets of parametrized stimuli (cars, faces, and abstract silhouettes) with defined shape similarity, which spanned a wide range of stimulus effectiveness and were presented alone or in pairs in two different locations (1.25º above fixation and 1.25º below fixation). Two other monkeys were tested using a fixed set of simple geometrical stimuli (a star, a cross, and a triangle) presented in all possible single, pair wise and triplet wise combinations in three different locations (2º above fixation, at fixation, and 2º below fixation). The shapes were each scaled to fit within a 2º bounding circle.

Our recordings showed that a large fraction of IT neuronal responses are not clutter tolerant, i.e. their responses to pairs of simultaneously presented objects are weaker than responses to the most effective stimuli of the pairs presented alone. Most specifically, we found that IT responses to pairs and triplets of stimuli can be reliably modeled as the average of the responses to the individual stimuli composing the pairs/triplets. The agreement of neuronal data to this descriptive model does not depend on the similarity between the objects in the pairs and becomes virtually perfect when responses of many neurons are pooled together. These findings are consistent with a mechanistic model in which the output of each IT neuron is normalized by the total amount of activation in the population of IT cells co-activated by the stimulus pair or triplet. This response re-scaling or divisive normalization could be computationally useful to prevent IT neuronal response from saturating and could be part of an efficient population coding strategy to represent multiple shapes in IT. Other potential explanations for the observed responses in clutter could simply rely in the feed-forward connections from V4 (or posterior IT) neurons to postsynaptic targets in anterior IT. For instance, the normalization mechanism responsible for rescaling IT responses could take place at the level of V4 afferents instead than requiring a population of co-activated IT neurons. As alternative, the interference between the patterns of activation produced by different shapes in V4 afferents could be responsible of reducing the response to multiple objects in a postsynaptic IT neuron to be an approximate average of the response to individual shapes. Minjoon Kouh recently started to test these possible mechanistic explanations of the observed clutter responses by using the standard model of object recognition developed at CBCL[3].

Acknowledgments:

This report describes research done at the Center for Biological & Computational Learning, which is in the McGovern Institute for Brain Research at MIT, as well as in the Dept. of Brain & Cognitive Sciences, and which is affiliated with the Computer Sciences & Artificial Intelligence Laboratory (CSAIL).

This research was sponsored by grants from: Office of Naval Research (DARPA) Contract No. MDA972-04-1-0037, Office of Naval Research (DARPA) Contract No. N00014-02-1-0915, National Science Foundation (ITR/IM) Contract No. IIS-0085836, National Science Foundation (ITR/SYS) Contract No. IIS-0112991, National Science Foundation (ITR) Contract No. IIS-0209289, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218693, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218506, and National Institutes of Health (Conte) Contract No. 1 P20 MH66239-01A1.

Additional support was provided by: Central Research Institute of Electric Power Industry, Center for e-Business (MIT), Daimler-Chrysler AG, Compaq/Digital Equipment Corporation, Eastman Kodak Company, Honda R&D Co., Ltd., ITRI, Komatsu Ltd., Eugene McDermott Foundation, Merrill-Lynch, Mitsubishi Corporation, NEC Fund, Nippon Telegraph & Telephone, Oxygen, Siemens Corporate Research, Inc., Sony MOU, Sumitomo Metal Industries, Toyota Motor Corporation, and WatchVision Co., Ltd..

Davide Zoccolan is supported by a Long Term Postdoctoral Fellowship of The International Human Frontier Science Program Organization.

References:

[1] K. Tanaka. “Inferotemporal Cortex and Object Vision,” Annu.Rev.Neurosci. 19, 109-139 (1996).

[2] N. K. Logothetis, J. Pauls, T. Poggio. “Shape Representation in the Inferior Temporal Cortex of Monkeys,” Curr.Biol. 5, 552-563 (1995).

[3] M. Riesenhuber and T. Poggio. “Hierarchical Models of Object Recognition in Cortex,” Nat.Neurosci. 2, 1019-1025 (1999).

[4] T. Sato. “Interactions Between of Visual Stimuli in the Receptive Fields of Inferior Temporal Neurons in Awake Macaques,” Exp. Brain Res. 77:23-30 (1989).

[5] E. Rolls and M. Tovee. “The Responses of Single Neurons in the Temporal Visual Cortical Areas of the Macaque When More Than One Stimulus is Present in the Receptive Field,” Exp. Brain Res. 103:409-420 (1995).

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)